Posted by Jaqui Herman, Research Specialist and Tim Herrmann, Program Manager
The 9th International Conference on Learning Representations (ICLR 2021), a virtual conference focused on deep learning, kicked off this week, offering conference and workshop tracks that present some of the latest research in deep learning and its applications to areas such as computer vision, computational biology, speech recognition, text understanding, and more.
As a Platinum Sponsor of ICLR 2021, Google will have a strong presence with over 100 accepted publications and participation on organizing committees and in workshops. If you have registered for ICLR 2021, we hope you’ll watch our talks and learn about the work at Google that goes into solving interesting problems for billions of people. Learn more about our research being presented in the list below (Googlers in bold).
Officers and Board Members
Includes: Hugo Larochelle, Tara Sainath
Organizing Committee
Includes: Sanmi Koyejo, Chelsea Finn
Area Chairs
Includes: Abhishek Kumar, Aditya Menon, Aleksandra Faust, Alexey Dosovitskiy, Andrew Cotter, Andrew Dai, Augustus Odena, Been Kim, Behnam Neyshabur, Ben Poole, Bo Dai, Bo Li, Branislav Kveton, Ce Liu, Claudio Gentile, Colin Raffel, Danny Tarlow, David Ha, Dengyong Zhou, Dumitru Erhan, Dustin Tran, Felix Hill, George Tucker, Hanie Sedghi, Heinrich Jiang, Hossein Mobahi, Izhak Shafran, Jascha Sohl-Dickstein, Jasper Snoek, Jean-Philippe Vert, Jeffrey Pennington, Justin Gilmer, Kevin Swersky, Marco Cuturi, Mario Lucic, Marlos C. Machado, Mathieu Blondel, Matt Johnson, Matthieu Geist, Mohammad Norouzi, Naman Agarwal, Navdeep Jaitly, Nicolas Le Roux, Niki Parmar, Olivier Bachem, Olivier Pietquin, Philip Long, Quentin Berthet, Razvan Pascanu, Rodolphe Jenatton, Samy Bengio*, Sebastian Nowozin, Silvio Lattanzi, Slav Petrov, Srinadh Bhojanapalli, Suman Ravuri, Tim Salimans, Vitaly Kuznetsov, William Cohen, Yann Dauphin, Yujia Li
Publications
Scalable Learning and MAP Inference for Nonsymmetric Determinantal Point Processes
Mike Gartrell, Insu Han, Elvis Dohmatob, Jennifer Gillenwater, Victor-Emmanuel Brunel
An Image is Worth 16×16 Words: Transformers for Image Recognition at Scale (see the blog post)
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby
Share or Not? Learning to Schedule Language-Specific Capacity for Multilingual Translation
Biao Zhang*, Ankur Bapna, Rico Sennrich, Orhan Firat
Evolving Reinforcement Learning Algorithms (see the blog post)
John D Co-Reyes, Yingjie Miao, Daiyi Peng, Esteban Real, Quoc V Le, Sergey Levine, Honglak Lee, Aleksandra Faust
Score-Based Generative Modeling through Stochastic Differential Equations
Yang Song*, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, Ben Poole
What Matters for On-Policy Deep Actor-Critic Methods? A Large-Scale Study
Marcin Andrychowicz, Anton Raichuk, Piotr Stańczyk, Manu Orsini, Sertan Girgin, Raphaël Marinier, Leonard Hussenot, Matthieu Geist, Olivier Pietquin, Marcin Michalski, Sylvain Gelly, Olivier Bachem
When Do Curricula Work?
Xiaoxia Wu, Ethan Dyer, Behnam Neyshabur
Sharpness-aware Minimization for Efficiently Improving Generalization
Pierre Foret*, Ariel Kleiner, Hossein Mobahi, Behnam Neyshabur
Gradient Vaccine: Investigating and Improving Multi-task Optimization in Massively Multilingual Models Zirui Wang*, Yulia Tsvetkov, Orhan Firat, Yuan Cao
Mathematical Reasoning via Self-supervised Skip-tree Training
Markus Norman Rabe, Dennis Lee, Kshitij Bansal, Christian Szegedy
Long-Tail Learning via Logit Adjustment
Aditya Krishna Menon, Sadeep Jayasumana, Ankit Singh Rawat, Himanshu Jain, Andreas Veit, Sanjiv Kumar
Are Neural Rankers Still Outperformed by Gradient Boosted Decision Trees?
Zhen Qin, Le Yan, Honglei Zhuang, Yi Tay, Rama Kumar Pasumarthi, Xuanhui Wang, Michael Bendersky, Marc Najork
LambdaNetworks: Modeling Long-Range Interactions without Attention
Irwan Bello
Contrastive Behavioral Similarity Embeddings for Generalization in Reinforcement Learning
Rishabh Agarwal, Marlos C. Machado, Pablo Samuel Castro, Marc G Bellemare
BUSTLE: Bottom-Up Program Synthesis Through Learning-Guided Exploration
Augustus Odena, Kensen Shi, David Bieber, Rishabh Singh, Charles Sutton, Hanjun Dai
Practical Real Time Recurrent Learning with a Sparse Approximation
Jacob Menick, Erich Elsen, Utku Evci, Simon Osindero, Karen Simonyan, Alex Graves
LEAF: A Learnable Frontend for Audio Classification (see the blog post)
Neil Zeghidour, Olivier Teboul, Félix de Chaumont Quitry, Marco Tagliasacchi
Batch Reinforcement Learning Through Continuation Method
Yijie Guo, Shengyu Feng, Nicolas Le Roux, Ed Chi, Honglak Lee, Minmin Chen
Scalable Transfer Learning with Expert Models
Joan Puigcerver, Carlos Riquelme Ruiz, Basil Mustafa, Cedric Renggli*, André Susano Pinto, Sylvain Gelly, Daniel Keysers, Neil Houlsby
Contrastive Behavioral Similarity Embeddings for Generalization in Reinforcement Learning
Rishabh Agarwal, Marlos C. Machado*, Pablo Samuel Castro, Marc G Bellemare
Scaling Symbolic Methods Using Gradients for Neural Model Explanation
Subham Sekhar Sahoo, Subhashini Venugopalan, Li Li, Rishabh Singh, Patrick Riley
Primal Wasserstein Imitation Learning (see the blog post)
Robert Dadashi, Leonard Hussenot, Matthieu Geist, Olivier Pietquin
Reset-Free Lifelong Learning with Skill-Space Planning
Kevin Lu, Aditya Grover, Pieter Abbeel, Igor Mordatch
Teaching Temporal Logics to Neural Networks
Christopher Hahn, Frederik Schmitt, Jens U. Kreber, Markus Norman Rabe, Bernd Finkbeiner
Shape-Texture Debiased Neural Network Training
Yingwei Li, Qihang Yu, Mingxing Tan, Jieru Mei, Peng Tang, Wei Shen, Alan Yuille, Cihang Xie
Rethinking Embedding Coupling in Pre-trained Language Models
Hyung Won Chung, Thibault Fevry*, Henry Tsai, Melvin Johnson, Sebastian Ruder
Overparameterisation and Worst-Case Generalisation: Friend or Foe?
Aditya Krishna Menon, Ankit Singh Rawat, Sanjiv Kumar
Single-Photon Image Classification
Thomas Fischbacher, Luciano Sbaiz
Into the Wild with AudioScope: Unsupervised Audio-Visual Separation of On-Screen Sounds
Efthymios Tzinis*, Scott Wisdom, Aren Jansen, Shawn Hershey, Tal Remez, Daniel P. W. Ellis, John R. Hershey
Adaptive Federated Optimization
Sashank J. Reddi, Zachary Charles, Manzil Zaheer, Zachary Garrett, Keith Rush, Jakub Konečný, Sanjiv Kumar, Hugh Brendan McMahan
Share or Not? Learning to Schedule Language-Specific Capacity for Multilingual Translation
Biao Zhang*, Ankur Bapna, Rico Sennrich, Orhan Firat
Off-Dynamics Reinforcement Learning: Training for Transfer with Domain Classifiers
Benjamin Eysenbach, Shreyas Chaudhari, Swapnil Asawa, Sergey Levine, Ruslan Salakhutdinov
Open Question Answering over Tables and Text
Wenhu Chen*, Ming-Wei Chang, Eva Schlinger, William Yang Wang, William W. Cohen
Practical Real Time Recurrent Learning with a Sparse Approximation
Jacob Menick, Erich Elsen, Utku Evci, Simon Osindero, Karen Simonyan, Alex Graves
IDF++: Analyzing and Improving Integer Discrete Flows for Lossless Compression
Rianne van den Berg, Alexey A. Gritsenko, Mostafa Dehghani, Casper Kaae Sønderby, Tim Salimans
A Universal Representation Transformer Layer for Few-Shot Image Classification
Lu Liu, William L. Hamilton, Guodong Long, Jing Jiang, Hugo Larochelle
Tradeoffs in Data Augmentation: An Empirical Study
Raphael Gontijo-Lopes, Sylvia Smullin, Ekin Dogus Cubuk, Ethan Dyer
Coping with Label Shift via Distributionally Robust Optimisation
Jingzhao Zhang, Aditya Krishna Menon, Andreas Veit, Srinadh Bhojanapalli, Sanjiv Kumar, Suvrit Sra
Rethinking Attention with Performers (see the blog post)
Krzysztof Marcin Choromanski, Valerii Likhosherstov, David Dohan, Xingyou Song, Andreea Gane, Tamas Sarlos, Peter Hawkins, Jared Quincy Davis, Afroz Mohiuddin, Lukasz Kaiser, David Benjamin Belanger, Lucy J Colwell, Adrian Weller
Teaching with Commentaries
Aniruddh Raghu*, Maithra Raghu, Simon Kornblith, David Duvenaud, Geoffrey Hinton
Anatomy of Catastrophic Forgetting: Hidden Representations and Task Semantics
Vinay Venkatesh Ramasesh, Ethan Dyer, Maithra Raghu
Model-Based Offline Planning
Arthur Argenson, Gabriel Dulac-Arnold
The Geometry of Integration in Text Classification RNNs
Kyle Aitken*, Vinay Venkatesh Ramasesh, Ankush Garg, Yuan Cao, David Sussillo, Niru Maheswaranathan
On the Origin of Implicit Regularization in Stochastic Gradient Descent
Samuel L Smith, Benoit Dherin, David Barrett, Soham De
Score-Based Generative Modeling through Stochastic Differential Equations
Yang Song*, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, Ben Poole
The Deep Bootstrap Framework: Good Online Learners are Good Offline Generalizers (see the blog post)
Preetum Nakkiran*, Behnam Neyshabur, Hanie Sedghi
Learning Energy-Based Models by Diffusion Recovery Likelihood
Ruiqi Gao, Yang Song, Ben Poole, Ying Nian Wu, Diederik P Kingma
Latent Skill Planning for Exploration and Transfer
Kevin Xie, Homanga Bharadhwaj, Danijar Hafner, Animesh Garg, Florian Shkurti
PseudoSeg: Designing Pseudo Labels for Semantic Segmentation
Yuliang Zou*, Zizhao Zhang, Han Zhang, Chun-Liang Li, Xiao Bian, Jia-Bin Huang, Tomas Pfister
WaveGrad: Estimating Gradients for Waveform Generation
Nanxin Chen*, Yu Zhang, Heiga Zen, Ron J Weiss, Mohammad Norouzi, William Chan
One Network Fits All? Modular versus Monolithic Task Formulations in Neural Networks
Atish Agarwala, Abhimanyu Das, Brendan Juba*, Rina Panigrahy, Vatsal Sharan*, Xin Wang, Qiuyi Zhang
Long Range Arena : A Benchmark for Efficient Transformers
Yi Tay, Mostafa Dehghani, Samira Abnar, Yikang Shen, Dara Bahri, Philip Pham, Jinfeng Rao, Liu Yang, Sebastian Ruder, Donald Metzler
Explainable Deep One-Class Classification
Philipp Liznerski, Lukas Ruff, Robert A. Vandermeulen, Billy Joe Franks, Marius Kloft, Klaus Robert Muller
Net-DNF: Effective Deep Modeling of Tabular Data
Liran Katzir, Gal Elidan, Ran El-Yaniv
Deployment-Efficient Reinforcement Learning via Model-Based Offline Optimization
Tatsuya Matsushima, Hiroki Furuta, Yutaka Matsuo, Ofir Nachum, Shixiang Gu
Auxiliary Task Update Decomposition: The Good, the Bad and the Neutral
Lucio M. Dery, Yann Dauphin, David Grangier
Long-Tail Learning via Logit Adjustment
Aditya Krishna Menon, Sadeep Jayasumana, Ankit Singh Rawat, Himanshu Jain, Andreas Veit, Sanjiv Kumar
Average-Case Acceleration for Bilinear Games and Normal Matrices
Carles Domingo-Enrich, Fabian Pedregosa, Damien Scieur
OPAL: Offline Primitive Discovery for Accelerating Offline Reinforcement Learning
Anurag Ajay*, Aviral Kumar, Pulkit Agrawal, Sergey Levine, Ofir Nachum
Training Independent Subnetworks for Robust Prediction
Marton Havasi*, Rodolphe Jenatton, Stanislav Fort, Jeremiah Zhe Liu, Jasper Snoek, Balaji Lakshminarayanan, Andrew Mingbo Dai, Dustin Tran
Benchmarks for Deep Off-Policy Evaluation
Justin Fu, Mohammad Norouzi, Ofir Nachum, George Tucker, Ziyu Wang, Alexander Novikov, Mengjiao Yang, Michael R Zhang, Yutian Chen, Aviral Kumar, Cosmin Paduraru, Sergey Levine, Thomas Paine
TropEx: An Algorithm for Extracting Linear Terms in Deep Neural Networks
Martin Trimmel, Henning Petzka, Cristian Sminchisescu
Mastering Atari with Discrete World Models (see the blog post)
Danijar Hafner, Timothy P Lillicrap, Mohammad Norouzi, Jimmy Ba
Exploring the Uncertainty Properties of Neural Networks’ Implicit Priors in the Infinite-Width Limit
Danijar Hafner, Timothy P Lillicrap, Mohammad Norouzi, Jimmy Ba
Graph Traversal with Tensor Functionals: A Meta-Algorithm for Scalable Learning
Ben Adlam, Jaehoon Lee, Lechao Xiao, Jeffrey Pennington, Jasper Snoek
Anchor & Transform: Learning Sparse Embeddings for Large Vocabularies
Paul Pu Liang*, Manzil Zaheer, Yuan Wang, Amr Ahmed
Sharpness-Aware Minimization for Efficiently Improving Generalization
Pierre Foret*, Ariel Kleiner, Hossein Mobahi, Behnam Neyshabur
HyperGrid Transformers: Towards A Single Model for Multiple Tasks
Yi Tay, Zhe Zhao, Dara Bahri, Donald Metzler, Da-Cheng Juan
Federated Learning via Posterior Averaging: A New Perspective and Practical Algorithms
Maruan Al-Shedivat*, Jennifer Gillenwater, Eric Xing, Afshin Rostamizadeh
BUSTLE: Bottom-Up Program Synthesis Through Learning-Guided Exploration
Augustus Odena, Kensen Shi, David Bieber, Rishabh Singh, Charles Sutton, Hanjun Dai
Are Neural Rankers Still Outperformed by Gradient Boosted Decision Trees?
Zhen Qin, Le Yan, Honglei Zhuang, Yi Tay, Rama Kumar Pasumarthi, Xuanhui Wang, Michael Bendersky, Marc Najork
Do Wide and Deep Networks Learn the Same Things? Uncovering How Neural Network Representations Vary with Width and Depth
Thao Nguyen, Maithra Raghu, Simon Kornblith
A Unifying View on Implicit Bias in Training Linear Neural Networks
Chulhee Yun*, Shankar Krishnan, Hossein Mobahi
Implicit Under-Parameterization Inhibits Data-Efficient Deep Reinforcement Learning
Aviral Kumar, Rishabh Agarwal, Dibya Ghosh, Sergey Levine
Mathematical Reasoning via Self-Supervised Skip-Tree Training
Markus Norman Rabe, Dennis Lee, Kshitij Bansal, Christian Szegedy
Lipschitz Recurrent Neural Networks
N. Benjamin Erichson, Omri Azencot, Alejandro Queiruga, Liam Hodgkinson, Michael W. Mahoney
Autoregressive Dynamics Models for Offline Policy Evaluation and Optimization
Michael R Zhang*, Thomas Paine, Ofir Nachum, Cosmin Paduraru, George Tucker, ziyu wang, Mohammad Norouzi
The Importance of Pessimism in Fixed-Dataset Policy Optimization
Jacob Buckman, Carles Gelada, Marc G Bellemare
Monotonic Kronecker-Factored Lattice
William Taylor Bakst, Nobuyuki Morioka, Erez Louidor
What Matters for On-Policy Deep Actor-Critic Methods? A Large-Scale Study
Marcin Andrychowicz, Anton Raichuk, Piotr Stańczyk, Manu Orsini, Sertan Girgin, Raphaël Marinier, Leonard Hussenot, Matthieu Geist, Olivier Pietquin, Marcin Michalski, Sylvain Gelly, Olivier Bachem
Adversarially Guided Actor-Critic
Yannis Flet-Berliac, Johan Ferret, Olivier Pietquin, Philippe Preux, Matthieu Geist
Scalable Learning and MAP Inference for Nonsymmetric Determinantal Point Processes
Mike Gartrell, Insu Han, Elvis Dohmatob, Jennifer Gillenwater, Victor-Emmanuel Brunel
GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding
Dmitry Lepikhin, HyoukJoong Lee, Yuanzhong Xu, Dehao Chen, Orhan Firat, Yanping Huang, Maxim Krikun, Noam Shazeer, Zhifeng Chen
Revisiting Hierarchical Approach for Persistent Long-Term Video Prediction
Wonkwang Lee, Whie Jung, Han Zhang, Ting Chen, Jing Yu Koh, Thomas Huang, Hyungsuk Yoon, Honglak Lee*, Seunghoon Hong
Gradient Vaccine: Investigating and Improving Multi-task Optimization in Massively Multilingual Models
Zirui Wang, Yulia Tsvetkov, Orhan Firat, Yuan Cao
Dataset Meta-Learning from Kernel Ridge-Regression
Timothy Nguyen, Zhourong Chen, Jaehoon Lee
Dual-Mode ASR: Unify and Improve Streaming ASR with Full-Context Modeling
Jiahui Yu, Wei Han, Anmol Gulati, Chung-Cheng Chiu, Bo Li, Tara N Sainath, Yonghui Wu, Ruoming Pang
Implicit Gradient Regularization
David Barrett, Benoit Dherin
Contrastive Behavioral Similarity Embeddings for Generalization in Reinforcement Learning
Rishabh Agarwal, Marlos C. Machado, Pablo Samuel Castro, Marc G Bellemare
Deconstructing the Regularization of BatchNorm
Yann Dauphin, Ekin Dogus Cubuk
C-Learning: Learning to Achieve Goals via Recursive Classification
Benjamin Eysenbach, Ruslan Salakhutdinov, Sergey Levine
Evolving Reinforcement Learning Algorithms
John D Co-Reyes, Yingjie Miao, Daiyi Peng, Esteban Real, Quoc V Le, Sergey Levine, Honglak Lee, Aleksandra Faust
Colorization Transformer
Manoj Kumar, Dirk Weissenborn, Nal Kalchbrenner
Control-Aware Representations for Model-based Reinforcement Learning
Brandon Cui, Yinlam Chow, Mohammad Ghavamzadeh
Evaluations and Methods for Explanation through Robustness Analysis
Cheng-Yu Hsieh, Chih-Kuan Yeh, Xuanqing Liu, Pradeep Kumar Ravikumar, Seungyeon Kim, Sanjiv Kumar, Cho-Jui Hsieh
Learning and Evaluating Representations for Deep One-Class Classification
Kihyuk Sohn, Chun-Liang Li, Jinsung Yoon, Minho Jin, Tomas Pfister
No MCMC for Me: Amortized Sampling for Fast and Stable Training of Energy-Based Models
Will Sussman Grathwohl, Jacob Jin Kelly, Milad Hashemi, Mohammad Norouzi, Kevin Swersky, David Duvenaud
Neural Thompson Sampling
Weitong ZHANG, Dongruo Zhou, Lihong Li, Quanquan Gu
A Design Space Study for LISTA and Beyond
Tianjian Meng, Xiaohan Chen, Yifan Jiang, Zhangyang Wang
i-Mix: A Domain-Agnostic Strategy for Contrastive Representation Learning
Kibok Lee, Yian Zhu, Kihyuk Sohn, Chun-Liang Li, Jinwoo Shin, Honglak Lee
Factorizing Declarative and Procedural Knowledge in Structured, Dynamical Environments
Anirudh Goyal, Alex Lamb, Phanideep Gampa, Philippe Beaudoin, Charles Blundell, Sergey Levine, Yoshua Bengio, Michael Curtis Mozer
Calibration of Neural Networks using Splines
Kartik Gupta, Amir Rahimi, Thalaiyasingam Ajanthan, Thomas Mensink, Cristian Sminchisescu, Richard Hartley
Extreme Memorization via Scale of Initialization
Harsh Mehta, Ashok Cutkosky, Behnam Neyshabur
Molecule Optimization by Explainable Evolution
Binghong Chen, Tianzhe Wang, Chengtao Li, Hanjun Dai, Le Song
Combining Ensembles and Data Augmentation Can Harm Your Calibration
Yeming Wen, Ghassen Jerfel, Rafael Muller, Michael W Dusenberry, Jasper Snoek, Balaji Lakshminarayanan, Dustin Tran
Workshops
Science and Engineering of Deep Learning
Speakers and Panelists include: Alex Hanna
Moderator and Advisors include: Emily Denton
Organizers include: Negar Rostemzadeh, Samy Bengio*
Synthetic Data Generation: Quality, Privacy, Bias
Speakers include: Jinsung Yoon, Emily Denton
Program Committee includes: Syed Ashrafulla
Enormous Language Models: Perspectives and Benchmarks
Speakers and Panelists include: Noam Shazeer, Natalie Schluter
Organizers include: Colin Raffel, Adam Roberts, Jascha Sohl-Dickstein, Katherine Lee, William Fedus, Aitor Lewkowycz
The Role of Mathematical Reasoning in General Artificial Intelligence
Speakers and Panelists include: Markus Rabe, Christian Szegedy
Weakly Supervised Learning
Invited Speakers include: Lu Jiang
Learning to Learn
Organizers include: Yevgen Chebotar
Embodied Multimodal Learning (EML)
Invited Speakers includes: Sergey Levine
Distributed and Private Machine Learning
Program Committee includes: Peter Kairouz, Ananda Theertha Suresh
S2D-OLAD: From Shallow to Deep, Overcoming Limited and Adverse Data
Invited Speakers include: Alex Hanna, Hugo Larochelle
Organizers include: Vincent Dumoulin
Responsible AI (RAI)
Speakers include: Been Kim
Energy-Based Models: Current Perspectives, Challenges, and Opportunities
Organizers include: Adji Bousso Dieng, Igor Mordatch
A Roadmap to Never-Ending RL
Invited Session Panelists include: Aleksandra Faust
Program Committee includes: Coline Devin, Karol Hausman, Ben Eysenbach, Ofir Nachum, Ryan Julian, Tianhe Yu, Dumitru Erhan, Marc Pickett, Shixiang Gu
2nd Workshop on Practical ML for Developing Countries: Learning Under Limited/low Resource Scenarios
Program Committee includes: Pablo Samuel Castro
Beyond Static Papers: Rethinking How We Share Scientific Understanding in ML
Speakers include: David Ha, Hugo Larochelle
Organizers include: Sara Hooker
* Indicates work done while at Google