Categories
Misc

BMW Brings Together Art, Artificial Intelligence for Virtual Installation Using NVIDIA StyleGAN

BMW today unveiled a virtual art installation that projects AI-generated artwork onto a virtual rendition of the automaker’s 8 Series Gran Coupe.  Dubbed “The Ultimate AI Masterpiece,” the installation harnessed NVIDIA StyleGAN — a generative model for high-resolution images — to create original artwork projection-mapped onto the virtual vehicle. The project debuts in conjunction with … Continued

BMW today unveiled a virtual art installation that projects AI-generated artwork onto a virtual rendition of the automaker’s 8 Series Gran Coupe. 

Dubbed “The Ultimate AI Masterpiece,” the installation harnessed NVIDIA StyleGAN — a generative model for high-resolution images — to create original artwork projection-mapped onto the virtual vehicle. The project debuts in conjunction with the contemporary art festival Frieze New York, and marks the 50th year of cultural engagement by the BMW Group.

“For 50 years, BMW has supported arts and culture through numerous initiatives as a way to engage and interact with consumers around the world in an authentic way,” said Uwe Dreher, vice president of marketing, BMW of North America. “As we continue these efforts into 2021, and look for new and creative ways to engage audiences, we shift to a virtual setting where we are combining centuries-old art and the latest AI technology to create something completely new and exciting.”

Collaborators Gary Yeh, founder of the art media company ArtDrunk, and Nathan Shipley, director of creative technology at Goodby, Silverstein & Partners, trained NVIDIA StyleGAN on 50,000 images of art across nine centuries as well as 50 contemporary works from artists BMW has worked with in past years. The trained model merges the learnings from classical art along with styles from the contemporary artists. 

“AI is an emerging medium of creative expression. It’s a fascinating space where art meets algorithm,” said Shipley. “Combining the historical works with the curated modern works and projecting the evolving images onto the 8 Series Gran Coupe serves a direct nod to BMW’s history of uniting automobiles, art, and technology.” 

The project uses the BMW car as a canvas to showcase each creator’s style — like that of South Korean charcoal artist Lee Bae. 

“In this case the AI learned from Lee Bae’s work. In a way, it sees those textures,” Shipley said. “And then on its own the AI generates this evolving stream of new textures. They’re informed by his work, but they’re also unique.”

Developed by NVIDIA Research, StyleGAN has been adopted for digital storytelling, art exhibits, manga illustrations and reimagined historical portraits.

For more AI-inspired artwork, visit the AI Art Gallery featured at the recent NVIDIA GPU Technology Conference.

Categories
Misc

Streaming Everything with NVIDIA Rivermax

NVIDIA Rivermax 1.5, the newest release of the IP-based video and data streaming library, includes key features and capabilities enabling performance boosts and quicker integrations.

In 2020, many of us adopted a work-from-home routine, and this new norm has been stressing IT networks. It shouldn’t be a surprise that the sudden boost in remote working drives the need for a more dynamic IT environment, one that can pull in resources on demand.

Over the past few years, we’ve focused on the Media & Entertainment (M&E) market, supporting the global industry as it evolves from proprietary SDI to cost-effective Ethernet/IP infrastructure solutions. NVIDIA technologies are enabling M&E to take the next transformational step toward cloud computing, while meeting compliance with the most stringent SMPTE ST-2110-21 specification requirements.

On the journey to modernize M&E network interconnect, we introduced NVIDIA Rivermax, an optimized, standard-compliant software library API for streaming data. Rivermax software runs on NVIDIA ConnectX-5 or later network adapters, enabling the use of common off-the-shelf (COTS) servers for streaming SD, HD, and up to Ultra-HD video flows. The Rivermax-ConnectX-5 adapter card combination also enables compliance with M&E specifications, such as the SMPTE 2110-21; reduces CPU utilization for video data streaming; and removes bottlenecks for the highest throughput. It can reach 82 Gbps of streamed video with a single CPU core.

As our partners have rolled out new Rivermax-based, full-IP solutions rigorously tested in their labs, we’re excited to share the fruits of these collaborative investments in Rivermax 1.5, the latest release of the streaming library. Rivermax 1.5 includes key features and capabilities enabling performance boosts and quicker integrations. One of these new features allows Rivermax-accelerated applications to stream not only video, audio, and ancillary data but other data stream formats as well, enabling Rivermax accelerations and CPU savings in many new markets and applications:

  • Compressed video
  • Healthcare imaging (DICOM-RTV)
  • Cloud gaming
  • Autonomous car sensor streaming (video/LiDAR/RADAR)
  • And more

Another good piece of news is that Rivermax 1.5 recently passed the JT-NM Tested program (March 16 – 20, 2020), allowing for integration and interoperability with multiple other market vendors.

Rivermax 1.5 release contents

The Rivermax 1.5 release contains the following updates and features:

  • Virtualized Rivermax over vmware ESXi and Linux OpenStack (currently in beta-level support)
  • Rivermax API updates:
    • Replaced TX pause API with a flag to commit API
    • Changed structure of in-buffer attributes
    • Changed function signature of in-query buffer API
  • New 802.1Q VLAN tagging support
  • New SDK code examples:
    • Media sender:
      • Real video content, interlace, 59.94, 29.97
    • Media receiver:
      • GPU-CUDA support for color space conversion (from YCBCR to RGB): Display or playback a video stream on screen or through X11 SSH
      • Interlace video formats
      • 2022-7 Rx SW sample code to get you started quickly on software implementation of 2022-7, which will be offloaded to ConnectX-6 Dx hardware with future releases
  • Generic API (beta version): For streaming any type of data. Get all the goodies of Rivermax, like traffic shaping (accurate packet pacing), high bandwidth for any type of UDP-based data stream with low CPU utilization and supporting both Linux and Windows.
  • Introduce Rivermax for Mellanox ConnectX-6 Dx in beta-level support over Linux OS (with feature parity to ConnectX-5)
  • NVIDIA-Jetson platform software image (as presented at IBC2019)
    • Based on Rivermax 1.5 release
    • Demos running Rivermax on NVIDIA-Jetson platform
    • Includes sender and receiver examples
    • GPU is integrated with the Media_receiver for both CSC and on-screen rendering
    • AnalyzeX (SMPTE ST2110-20 verification software) while running video viewers

Want to discuss Rivermax? Comment below or reach out to your local account/support team.

Here’s to seeing you at the next M&E show!

Categories
Misc

Easiest way to get/set flattened array of trainable weights (and biases)

For example i want to be able to do something like this…

weights = model.get_trainable_weights() weights *= 2 model.set_trainable_weights(weights) 

I’ve googled it and seems like getting trainable weights might be pretty straightforward, but i’m not finding anything on being able to supply a flat array of weights for the model to set.

Right now I’m manually tracking the shapes, calculating which part of the flat array is for this tensor then taking that subset and reshaping it. It’s seems more difficult than it needs to be plus its also pretty expensive computationally taking as long as a tenth of a second just to set weights.

submitted by /u/Yogi_DMT
[visit reddit] [comments]

Categories
Misc

Handy data augmentation toolkit for image classification put in a single efficient TensorFlow op

Handy data augmentation toolkit for image classification put in a single efficient TensorFlow op submitted by /u/lnstadrum
[visit reddit] [comments]
Categories
Offsites

Do Wide and Deep Networks Learn the Same Things?

A common practice to improve a neural network’s performance and tailor it to available computational resources is to adjust the architecture depth and width. Indeed, popular families of neural networks, including EfficientNet, ResNet and Transformers, consist of a set of architectures of flexible depths and widths. However, beyond the effect on accuracy, there is limited understanding of how these fundamental choices of architecture design affect the model, such as the impact on its internal representations.

In “Do Wide and Deep Networks Learn the Same Things? Uncovering How Neural Network Representations Vary with Width and Depth”, we perform a systematic study of the similarity between wide and deep networks from the same architectural family through the lens of their hidden representations and final outputs. In very wide or very deep models, we find a characteristic block structure in their internal representations, and establish a connection between this phenomenon and model overparameterization. Comparisons across models demonstrate that those without the block structure show significant similarity between representations in corresponding layers, but those containing the block structure exhibit highly dissimilar representations. These properties of the internal representations in turn translate to systematically different errors at the class and example levels for wide and deep models when they are evaluated on the same test set.

Comparing Representation Similarity with CKA
We extended prior work on analyzing representations by leveraging our previously developed Centered Kernel Alignment (CKA) technique, which provides a robust, scalable way to determine the similarity between the representations learned by any pair of neural network layers. CKA takes as input the representations (i.e., the activation matrices) from two layers, and outputs a similarity score between 0 (not at all similar) and 1 (identical representations).

We apply CKA to a family of ResNets of varying depths and widths, trained on common benchmark datasets (CIFAR-10, CIFAR-100 and ImageNet), and use representation heatmaps to illustrate the results. The x and y axes of each heatmap index the layers of the model(s) in consideration, going from input to output, and each entry (i, j) is the CKA similarity score between layer i and layer j.

We use CKA to compute the representation similarity for all pairs of layers within a single model (i.e., when network 1 and network 2 are identical), and across models (i.e., when network 1 and network 2 are trained with different random initializations, or have different architectures altogether).

Below is an example of the resulting heatmap when we compare representations of each layer to every other layer within a single ResNet of depth 26 and width multiplier 1. In the design convention used here, the stated depth only refers to the number of convolutional layers in the network, but we analyze all layers present, and the width multiplier applies to the number of filters in each convolution. Notice the checkerboard pattern in the heatmap, which is caused by skip connections (shortcuts between layers) in the architecture.

The Emergence of the Block Structure
What stands out from the representation heatmaps of deeper or wider networks is the emergence of a large set of consecutive layers with highly similar representations, which appears in the heatmaps as a yellow square (i.e., a region with high CKA scores). This phenomenon, which we call the block structure, suggests that the underlying layers may not be as efficient at progressively refining the network’s representations as we expect. Indeed, we show that the task performance becomes stagnant inside the block structure, and that it is possible to prune some underlying layers without affecting the final performance.

Block structure — a large, contiguous set of layers with highly similar representations — emerges with increasing width or depth. Each heatmap panel shows the CKA similarity between all pairs of layers within a single neural network. While its size and position can vary across different training runs, the block structure is a robust phenomenon that arises consistently in larger models.

With additional experiments, we show that the block structure has less to do with the absolute model size, than with the size of the model relative to the size of the training dataset. As we reduce the training dataset size, the block structure starts to appear in shallower and narrower networks:

With increasing network width (towards the right along each row) and decreasing dataset size (down each column), the relative model capacity (with respect to a given task) is effectively inflated, and the block structure begins to appear in smaller models.

Through further analysis, we are also able to demonstrate that the block structure arises from preserving and propagating the dominant principal components of its underlying representations. Refer to our paper for more details.

Comparing Representations Across Models
Going further, we study the implications of depth and width on representations across models of different random initializations and different architectures, and find that the presence of block structure makes a significant difference in this context as well. Despite having different architectures, wide and deep models without the block structure do exhibit representation similarity with each other, with corresponding layers broadly being of the same proportional depth in the model. However, when the block structure is present, its representations are unique to each model. This suggests that despite having similar overall performance, each wide or deep model with the block structure picks up a unique mapping from the input to the output.

For smaller models (e.g., ResNet-38 1×), CKA across different initializations (off the diagonal) closely resembles CKA within a single model (on the diagonal). In contrast, representations within the block structure of wider and deeper models (e.g., ResNet-38 10×, ResNet-164 1×) are highly dissimilar across training runs.

Error Analysis of Wide and Deep Models
Having explored the properties of the learned representations of wide and deep models, we next turn to understanding how they influence the diversity of the output predictions. We train populations of networks of different architectures and determine on which test set examples each architecture configuration tends to make errors.

On both CIFAR-10 and ImageNet datasets, wide and deep models that have the same average accuracy still demonstrate statistically significant differences in example-level predictions. The same observation holds for class-level errors on ImageNet, with wide models exhibiting a small advantage in identifying classes corresponding to scenes, and deep networks being relatively more accurate on consumer goods.

Per-class differences on ImageNet between models with increased width (y-axis) or depth (x-axis). Orange dots reflect differences between two sets of 50 different random initializations of ResNet-83 (1×).

Conclusions
In studying the effects of depth and width on internal representations, we uncover a block structure phenomenon, and demonstrate its connection to model capacity. We also show that wide and deep models exhibit systematic output differences at class and example levels. Check out the paper for full details on these results and additional insights! We’re excited about the many interesting open questions these findings suggest, such as how the block structure arises during training, whether the phenomenon occurs in domains beyond image classification, and ways these insights on internal representations can inform model efficiency and generalization.

Acknowledgements
This is a joint work with Maithra Raghu and Simon Kornblith. We would like to thank Tom Small for the visualizations of the representation heatmap.

Categories
Misc

AI Gone Global: Why 20,000+ Developers from Emerging Markets Signed Up for GTC

Major tech conferences are typically hosted in highly industrialized countries. But the appetite for AI and data science resources spans the globe — with an estimated 3 million developers in emerging markets. Our recent GPU Technology Conference — virtual, free to register, and featuring 24/7 content — for the first time featured a dedicated track on Read article >

The post AI Gone Global: Why 20,000+ Developers from Emerging Markets Signed Up for GTC appeared first on The Official NVIDIA Blog.

Categories
Misc

NVIDIA Merlin Latest Enhancements Streamlines Recommender Workflows with .5 Release

The latest Merlin .5 update includes a data generator for training, multi-GPU dataloader, and initial support for session-based recommenders.

Billions of people in the world are online. Many discrete moments online are spent browsing, shopping, streaming entertainment, or engaging with social media. Each discrete moment, or session, online is an opportunity for recommenders to make informed decisions a bit easier, faster, and more personalized for an individual person.Yet, when considering scale, this translates into recommenders potentially supporting billions of people interacting with trillions of things online.

At GTC Spring 2021, NVIDIA shared how retail, entertainment, on-demand, and social companies are building and utilizing recommenders at scale including early adopters of NVIDIA Merlin. Merlin open source components include NVTabular for ETL, HugeCTR for training, and Triton for inference. The NVIDIA Merlin team continues to ingest feedback from early adopters to streamline recommender workflows for machine learning engineers. The latest Merlin .5 update includes a data generator for training, multi-GPU dataloader, and initial support for session-based recommenders. Also, the update continuously reaffirms NVIDIA’s commitment to democratizing and streamlining recommender workflows. 

Supporting Experimentation and Streamlining Recommender Workflows 

Ongoing experimentation is vital for fine tuning recommender models performance before models are deployed to production. A configurable data generator, using synthetic data, helps machine learning engineers calculate the probability distribution to be uniform or power-law for categorical features, without modifying the configuration file. Merlin HugeCTR’s new data generator considers categorical data and is particularly helpful for benchmarking and research purposes. 

Merlin .5’s inclusion of a multi-GPU dataloader was based on feedback from Merlin early adopters and also helps streamline workflows. Machine learning engineers are able to use the Merlin NVTabular TensorFlow (TF) dataloader for multi-GPU training on a  single node using TF Distributed. Merlin NVTabular utilizes Dask and Dask-cuDF to scale easily to multi-GPU and multi-node as well as provide a high-performance recommender specific ETL pipeline.

Merlin Session-Based Recommenders Support: Just A Beginning 

Data scientists and machine learning engineers at the forefront of e-commerce, news, and social media recommender work have added, or are considering to add, session-based recommenders. While collaborative filtering and content-based filtering are established recommender methods, session-based recommenders are gaining attention due to the potential increased accuracy of predictions when users interests are dynamic and specific to a shorter time frame (i.e., within a session). With Merlin .5, NVTabular provides new preprocessing functionality needed to transform and group data for session based-recommenders.

 Download and Try Merlin’s Latest Update

The latest preprocessing and training enhancements to NVIDIA Merlin reaffirms NVIDIA’s commitment to democratizing and accelerating recommender workflows. As machine learning engineers and data scientists use a hybrid of libraries, packages, tools, and techniques to create effective and impactful recommenders, Merlin components are designed to be easy-to-use and interoperable with existing recommender workflows. 

To discover hands-on how Merlin components streamline recommender workflows, download and try Merlin NVTabular for ETL, HugeCTR for training, and Triton for inference.

Categories
Misc

Creating Tensorflow Dataset for Object Recognition in Keras

Hi,

I was wondering if someone could aid me in solving this problem. I have been following this tutorial, which uses the COCO dataset from tfds.

I am interested in using a different dataset, but I am having trouble adapting the code.

My dataset consists of a number of images, with corresponding bounding box annotations in a .csv. It can be summarized as this: [filename, xmin, ymin, xmax, ymax, class].

The code in this tutorial uses (to my understanding) a tensor dataset with the format [image_array, xmin, ymin, xmax, ymax, class].

How can I load this data in this format? I have been having a great deal of trouble finding any resources. Any help is greatly appreciated! I will mention how I have been approaching this below.

Summary of things applied:

Essentially, I have been able to load everything into a pandas dataframe with 6 columns, consisting of [filename, xmin, ymin, xmax, ymax, class]. However, I feel this to be inefficient, and I cannot get the last step (conversion to a tensor).

I try: d = tf.data.Dataset.from_tensors((df.values))

and get: ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type numpy.ndarray).

submitted by /u/Puzzled_Supports
[visit reddit] [comments]

Categories
Misc

How can I do grid to grid transitions with Tensorflow?

This is a complete newbie question. I’ve used Tensorflow before but only with Keras for classifying. I’m not all that knowledgeable about Tensorflow’s capabilities beyond that.

I have a problem where I want to computer the next step in a sequence. The input and output data are both in the form of 2d grids.

eg:

0 1 0 0 0 0 0 1 0 -> 1 1 1 0 1 0 0 0 0 

The problem is a bit more complex than that, but I think that’s the simplest example. The actual domain is hydrodynamics, how waves and currents interact with objects, land, and ships. This can be done with a simulation, but that is extremely computationally expensive. If I can get a “near enough” result in a much shorter time span that is acceptable.

There are many sub problems within this domain, and practically any simulation solution is a compromise.

I have been thinking about this problem seeing this video https://www.youtube.com/watch?v=2Bw5f4vYL98& – but I’m no AI researcher so the paper is beyond my level.

Do you think this is a problem that Tensorflow can be used to tackle?

submitted by /u/fgyoysgaxt
[visit reddit] [comments]

Categories
Offsites

Google at ICLR 2021

The 9th International Conference on Learning Representations (ICLR 2021), a virtual conference focused on deep learning, kicked off this week, offering conference and workshop tracks that present some of the latest research in deep learning and its applications to areas such as computer vision, computational biology, speech recognition, text understanding, and more.

As a Platinum Sponsor of ICLR 2021, Google will have a strong presence with over 100 accepted publications and participation on organizing committees and in workshops. If you have registered for ICLR 2021, we hope you’ll watch our talks and learn about the work at Google that goes into solving interesting problems for billions of people. Learn more about our research being presented in the list below (Googlers in bold).

Officers and Board Members
Includes: Hugo Larochelle, Tara Sainath

Organizing Committee
Includes: Sanmi Koyejo, Chelsea Finn

Area Chairs
Includes: Abhishek Kumar, Aditya Menon, Aleksandra Faust, Alexey Dosovitskiy, Andrew Cotter, Andrew Dai, Augustus Odena, Been Kim, Behnam Neyshabur, Ben Poole, Bo Dai, Bo Li, Branislav Kveton, Ce Liu, Claudio Gentile, Colin Raffel, Danny Tarlow, David Ha, Dengyong Zhou, Dumitru Erhan, Dustin Tran, Felix Hill, George Tucker, Hanie Sedghi, Heinrich Jiang, Hossein Mobahi, Izhak Shafran, Jascha Sohl-Dickstein, Jasper Snoek, Jean-Philippe Vert, Jeffrey Pennington, Justin Gilmer, Kevin Swersky, Marco Cuturi, Mario Lucic, Marlos C. Machado, Mathieu Blondel, Matt Johnson, Matthieu Geist, Mohammad Norouzi, Naman Agarwal, Navdeep Jaitly, Nicolas Le Roux, Niki Parmar, Olivier Bachem, Olivier Pietquin, Philip Long, Quentin Berthet, Razvan Pascanu, Rodolphe Jenatton, Samy Bengio*, Sebastian Nowozin, Silvio Lattanzi, Slav Petrov, Srinadh Bhojanapalli, Suman Ravuri, Tim Salimans, Vitaly Kuznetsov, William Cohen, Yann Dauphin, Yujia Li

Publications
Scalable Learning and MAP Inference for Nonsymmetric Determinantal Point Processes
Mike Gartrell, Insu Han, Elvis Dohmatob, Jennifer Gillenwater, Victor-Emmanuel Brunel

An Image is Worth 16×16 Words: Transformers for Image Recognition at Scale (see the blog post)
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby

Share or Not? Learning to Schedule Language-Specific Capacity for Multilingual Translation
Biao Zhang*, Ankur Bapna, Rico Sennrich, Orhan Firat

Evolving Reinforcement Learning Algorithms (see the blog post)
John D Co-Reyes, Yingjie Miao, Daiyi Peng, Esteban Real, Quoc V Le, Sergey Levine, Honglak Lee, Aleksandra Faust

Score-Based Generative Modeling through Stochastic Differential Equations
Yang Song*, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, Ben Poole

What Matters for On-Policy Deep Actor-Critic Methods? A Large-Scale Study
Marcin Andrychowicz, Anton Raichuk, Piotr Stańczyk, Manu Orsini, Sertan Girgin, Raphaël Marinier, Leonard Hussenot, Matthieu Geist, Olivier Pietquin, Marcin Michalski, Sylvain Gelly, Olivier Bachem

When Do Curricula Work?
Xiaoxia Wu, Ethan Dyer, Behnam Neyshabur

Sharpness-aware Minimization for Efficiently Improving Generalization
Pierre Foret*, Ariel Kleiner, Hossein Mobahi, Behnam Neyshabur

Gradient Vaccine: Investigating and Improving Multi-task Optimization in Massively Multilingual Models Zirui Wang*, Yulia Tsvetkov, Orhan Firat, Yuan Cao

Mathematical Reasoning via Self-supervised Skip-tree Training
Markus Norman Rabe, Dennis Lee, Kshitij Bansal, Christian Szegedy

Long-Tail Learning via Logit Adjustment
Aditya Krishna Menon, Sadeep Jayasumana, Ankit Singh Rawat, Himanshu Jain, Andreas Veit, Sanjiv Kumar

Are Neural Rankers Still Outperformed by Gradient Boosted Decision Trees?
Zhen Qin, Le Yan, Honglei Zhuang, Yi Tay, Rama Kumar Pasumarthi, Xuanhui Wang, Michael Bendersky, Marc Najork

LambdaNetworks: Modeling Long-Range Interactions without Attention
Irwan Bello

Contrastive Behavioral Similarity Embeddings for Generalization in Reinforcement Learning
Rishabh Agarwal, Marlos C. Machado, Pablo Samuel Castro, Marc G Bellemare

BUSTLE: Bottom-Up Program Synthesis Through Learning-Guided Exploration
Augustus Odena, Kensen Shi, David Bieber, Rishabh Singh, Charles Sutton, Hanjun Dai

Practical Real Time Recurrent Learning with a Sparse Approximation
Jacob Menick, Erich Elsen, Utku Evci, Simon Osindero, Karen Simonyan, Alex Graves

LEAF: A Learnable Frontend for Audio Classification (see the blog post)
Neil Zeghidour, Olivier Teboul, Félix de Chaumont Quitry, Marco Tagliasacchi

Batch Reinforcement Learning Through Continuation Method
Yijie Guo, Shengyu Feng, Nicolas Le Roux, Ed Chi, Honglak Lee, Minmin Chen

Scalable Transfer Learning with Expert Models
Joan Puigcerver, Carlos Riquelme Ruiz, Basil Mustafa, Cedric Renggli*, André Susano Pinto, Sylvain Gelly, Daniel Keysers, Neil Houlsby

Contrastive Behavioral Similarity Embeddings for Generalization in Reinforcement Learning
Rishabh Agarwal, Marlos C. Machado*, Pablo Samuel Castro, Marc G Bellemare

Scaling Symbolic Methods Using Gradients for Neural Model Explanation
Subham Sekhar Sahoo, Subhashini Venugopalan, Li Li, Rishabh Singh, Patrick Riley

Primal Wasserstein Imitation Learning (see the blog post)
Robert Dadashi, Leonard Hussenot, Matthieu Geist, Olivier Pietquin

Reset-Free Lifelong Learning with Skill-Space Planning
Kevin Lu, Aditya Grover, Pieter Abbeel, Igor Mordatch

Teaching Temporal Logics to Neural Networks
Christopher Hahn, Frederik Schmitt, Jens U. Kreber, Markus Norman Rabe, Bernd Finkbeiner

Shape-Texture Debiased Neural Network Training
Yingwei Li, Qihang Yu, Mingxing Tan, Jieru Mei, Peng Tang, Wei Shen, Alan Yuille, Cihang Xie

Rethinking Embedding Coupling in Pre-trained Language Models
Hyung Won Chung, Thibault Fevry*, Henry Tsai, Melvin Johnson, Sebastian Ruder

Overparameterisation and Worst-Case Generalisation: Friend or Foe?
Aditya Krishna Menon, Ankit Singh Rawat, Sanjiv Kumar

Single-Photon Image Classification
Thomas Fischbacher, Luciano Sbaiz

Into the Wild with AudioScope: Unsupervised Audio-Visual Separation of On-Screen Sounds
Efthymios Tzinis*, Scott Wisdom, Aren Jansen, Shawn Hershey, Tal Remez, Daniel P. W. Ellis, John R. Hershey

Adaptive Federated Optimization
Sashank J. Reddi, Zachary Charles, Manzil Zaheer, Zachary Garrett, Keith Rush, Jakub Konečný, Sanjiv Kumar, Hugh Brendan McMahan

Share or Not? Learning to Schedule Language-Specific Capacity for Multilingual Translation
Biao Zhang*, Ankur Bapna, Rico Sennrich, Orhan Firat

Off-Dynamics Reinforcement Learning: Training for Transfer with Domain Classifiers
Benjamin Eysenbach, Shreyas Chaudhari, Swapnil Asawa, Sergey Levine, Ruslan Salakhutdinov

Open Question Answering over Tables and Text
Wenhu Chen*, Ming-Wei Chang, Eva Schlinger, William Yang Wang, William W. Cohen

Practical Real Time Recurrent Learning with a Sparse Approximation
Jacob Menick, Erich Elsen, Utku Evci, Simon Osindero, Karen Simonyan, Alex Graves

IDF++: Analyzing and Improving Integer Discrete Flows for Lossless Compression
Rianne van den Berg, Alexey A. Gritsenko, Mostafa Dehghani, Casper Kaae Sønderby, Tim Salimans

A Universal Representation Transformer Layer for Few-Shot Image Classification
Lu Liu, William L. Hamilton, Guodong Long, Jing Jiang, Hugo Larochelle

Tradeoffs in Data Augmentation: An Empirical Study
Raphael Gontijo-Lopes, Sylvia Smullin, Ekin Dogus Cubuk, Ethan Dyer

Coping with Label Shift via Distributionally Robust Optimisation
Jingzhao Zhang, Aditya Krishna Menon, Andreas Veit, Srinadh Bhojanapalli, Sanjiv Kumar, Suvrit Sra

Rethinking Attention with Performers (see the blog post)
Krzysztof Marcin Choromanski, Valerii Likhosherstov, David Dohan, Xingyou Song, Andreea Gane, Tamas Sarlos, Peter Hawkins, Jared Quincy Davis, Afroz Mohiuddin, Lukasz Kaiser, David Benjamin Belanger, Lucy J Colwell, Adrian Weller

Teaching with Commentaries
Aniruddh Raghu*, Maithra Raghu, Simon Kornblith, David Duvenaud, Geoffrey Hinton

Anatomy of Catastrophic Forgetting: Hidden Representations and Task Semantics
Vinay Venkatesh Ramasesh, Ethan Dyer, Maithra Raghu

Model-Based Offline Planning
Arthur Argenson, Gabriel Dulac-Arnold

The Geometry of Integration in Text Classification RNNs
Kyle Aitken*, Vinay Venkatesh Ramasesh, Ankush Garg, Yuan Cao, David Sussillo, Niru Maheswaranathan

On the Origin of Implicit Regularization in Stochastic Gradient Descent
Samuel L Smith, Benoit Dherin, David Barrett, Soham De

Score-Based Generative Modeling through Stochastic Differential Equations
Yang Song*, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, Ben Poole

The Deep Bootstrap Framework: Good Online Learners are Good Offline Generalizers (see the blog post)
Preetum Nakkiran*, Behnam Neyshabur, Hanie Sedghi

Learning Energy-Based Models by Diffusion Recovery Likelihood
Ruiqi Gao, Yang Song, Ben Poole, Ying Nian Wu, Diederik P Kingma

Latent Skill Planning for Exploration and Transfer
Kevin Xie, Homanga Bharadhwaj, Danijar Hafner, Animesh Garg, Florian Shkurti

PseudoSeg: Designing Pseudo Labels for Semantic Segmentation
Yuliang Zou*, Zizhao Zhang, Han Zhang, Chun-Liang Li, Xiao Bian, Jia-Bin Huang, Tomas Pfister

WaveGrad: Estimating Gradients for Waveform Generation
Nanxin Chen*, Yu Zhang, Heiga Zen, Ron J Weiss, Mohammad Norouzi, William Chan

One Network Fits All? Modular versus Monolithic Task Formulations in Neural Networks
Atish Agarwala, Abhimanyu Das, Brendan Juba*, Rina Panigrahy, Vatsal Sharan*, Xin Wang, Qiuyi Zhang

Long Range Arena : A Benchmark for Efficient Transformers
Yi Tay, Mostafa Dehghani, Samira Abnar, Yikang Shen, Dara Bahri, Philip Pham, Jinfeng Rao, Liu Yang, Sebastian Ruder, Donald Metzler

Explainable Deep One-Class Classification
Philipp Liznerski, Lukas Ruff, Robert A. Vandermeulen, Billy Joe Franks, Marius Kloft, Klaus Robert Muller

Net-DNF: Effective Deep Modeling of Tabular Data
Liran Katzir, Gal Elidan, Ran El-Yaniv

Deployment-Efficient Reinforcement Learning via Model-Based Offline Optimization
Tatsuya Matsushima, Hiroki Furuta, Yutaka Matsuo, Ofir Nachum, Shixiang Gu

Auxiliary Task Update Decomposition: The Good, the Bad and the Neutral
Lucio M. Dery, Yann Dauphin, David Grangier

Long-Tail Learning via Logit Adjustment
Aditya Krishna Menon, Sadeep Jayasumana, Ankit Singh Rawat, Himanshu Jain, Andreas Veit, Sanjiv Kumar

Average-Case Acceleration for Bilinear Games and Normal Matrices
Carles Domingo-Enrich, Fabian Pedregosa, Damien Scieur

OPAL: Offline Primitive Discovery for Accelerating Offline Reinforcement Learning
Anurag Ajay*, Aviral Kumar, Pulkit Agrawal, Sergey Levine, Ofir Nachum

Training Independent Subnetworks for Robust Prediction
Marton Havasi*, Rodolphe Jenatton, Stanislav Fort, Jeremiah Zhe Liu, Jasper Snoek, Balaji Lakshminarayanan, Andrew Mingbo Dai, Dustin Tran

Benchmarks for Deep Off-Policy Evaluation
Justin Fu, Mohammad Norouzi, Ofir Nachum, George Tucker, Ziyu Wang, Alexander Novikov, Mengjiao Yang, Michael R Zhang, Yutian Chen, Aviral Kumar, Cosmin Paduraru, Sergey Levine, Thomas Paine

TropEx: An Algorithm for Extracting Linear Terms in Deep Neural Networks
Martin Trimmel, Henning Petzka, Cristian Sminchisescu

Mastering Atari with Discrete World Models (see the blog post)
Danijar Hafner, Timothy P Lillicrap, Mohammad Norouzi, Jimmy Ba

Exploring the Uncertainty Properties of Neural Networks’ Implicit Priors in the Infinite-Width Limit
Danijar Hafner, Timothy P Lillicrap, Mohammad Norouzi, Jimmy Ba

Graph Traversal with Tensor Functionals: A Meta-Algorithm for Scalable Learning
Ben Adlam, Jaehoon Lee, Lechao Xiao, Jeffrey Pennington, Jasper Snoek

Anchor & Transform: Learning Sparse Embeddings for Large Vocabularies
Paul Pu Liang*, Manzil Zaheer, Yuan Wang, Amr Ahmed

Sharpness-Aware Minimization for Efficiently Improving Generalization
Pierre Foret*, Ariel Kleiner, Hossein Mobahi, Behnam Neyshabur

HyperGrid Transformers: Towards A Single Model for Multiple Tasks
Yi Tay, Zhe Zhao, Dara Bahri, Donald Metzler, Da-Cheng Juan

Federated Learning via Posterior Averaging: A New Perspective and Practical Algorithms
Maruan Al-Shedivat*, Jennifer Gillenwater, Eric Xing, Afshin Rostamizadeh

BUSTLE: Bottom-Up Program Synthesis Through Learning-Guided Exploration
Augustus Odena, Kensen Shi, David Bieber, Rishabh Singh, Charles Sutton, Hanjun Dai

Are Neural Rankers Still Outperformed by Gradient Boosted Decision Trees?
Zhen Qin, Le Yan, Honglei Zhuang, Yi Tay, Rama Kumar Pasumarthi, Xuanhui Wang, Michael Bendersky, Marc Najork

Do Wide and Deep Networks Learn the Same Things? Uncovering How Neural Network Representations Vary with Width and Depth
Thao Nguyen, Maithra Raghu, Simon Kornblith

A Unifying View on Implicit Bias in Training Linear Neural Networks
Chulhee Yun*, Shankar Krishnan, Hossein Mobahi

Implicit Under-Parameterization Inhibits Data-Efficient Deep Reinforcement Learning
Aviral Kumar, Rishabh Agarwal, Dibya Ghosh, Sergey Levine

Mathematical Reasoning via Self-Supervised Skip-Tree Training
Markus Norman Rabe, Dennis Lee, Kshitij Bansal, Christian Szegedy

Lipschitz Recurrent Neural Networks
N. Benjamin Erichson, Omri Azencot, Alejandro Queiruga, Liam Hodgkinson, Michael W. Mahoney

Autoregressive Dynamics Models for Offline Policy Evaluation and Optimization
Michael R Zhang*, Thomas Paine, Ofir Nachum, Cosmin Paduraru, George Tucker, ziyu wang, Mohammad Norouzi

The Importance of Pessimism in Fixed-Dataset Policy Optimization
Jacob Buckman, Carles Gelada, Marc G Bellemare

Monotonic Kronecker-Factored Lattice
William Taylor Bakst, Nobuyuki Morioka, Erez Louidor

What Matters for On-Policy Deep Actor-Critic Methods? A Large-Scale Study
Marcin Andrychowicz, Anton Raichuk, Piotr Stańczyk, Manu Orsini, Sertan Girgin, Raphaël Marinier, Leonard Hussenot, Matthieu Geist, Olivier Pietquin, Marcin Michalski, Sylvain Gelly, Olivier Bachem

Adversarially Guided Actor-Critic
Yannis Flet-Berliac, Johan Ferret, Olivier Pietquin, Philippe Preux, Matthieu Geist

Scalable Learning and MAP Inference for Nonsymmetric Determinantal Point Processes
Mike Gartrell, Insu Han, Elvis Dohmatob, Jennifer Gillenwater, Victor-Emmanuel Brunel

GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding
Dmitry Lepikhin, HyoukJoong Lee, Yuanzhong Xu, Dehao Chen, Orhan Firat, Yanping Huang, Maxim Krikun, Noam Shazeer, Zhifeng Chen

Revisiting Hierarchical Approach for Persistent Long-Term Video Prediction
Wonkwang Lee, Whie Jung, Han Zhang, Ting Chen, Jing Yu Koh, Thomas Huang, Hyungsuk Yoon, Honglak Lee*, Seunghoon Hong

Gradient Vaccine: Investigating and Improving Multi-task Optimization in Massively Multilingual Models
Zirui Wang, Yulia Tsvetkov, Orhan Firat, Yuan Cao

Dataset Meta-Learning from Kernel Ridge-Regression
Timothy Nguyen, Zhourong Chen, Jaehoon Lee

Dual-Mode ASR: Unify and Improve Streaming ASR with Full-Context Modeling
Jiahui Yu, Wei Han, Anmol Gulati, Chung-Cheng Chiu, Bo Li, Tara N Sainath, Yonghui Wu, Ruoming Pang

Implicit Gradient Regularization
David Barrett, Benoit Dherin

Contrastive Behavioral Similarity Embeddings for Generalization in Reinforcement Learning
Rishabh Agarwal, Marlos C. Machado, Pablo Samuel Castro, Marc G Bellemare

Deconstructing the Regularization of BatchNorm
Yann Dauphin, Ekin Dogus Cubuk

C-Learning: Learning to Achieve Goals via Recursive Classification
Benjamin Eysenbach, Ruslan Salakhutdinov, Sergey Levine

Evolving Reinforcement Learning Algorithms
John D Co-Reyes, Yingjie Miao, Daiyi Peng, Esteban Real, Quoc V Le, Sergey Levine, Honglak Lee, Aleksandra Faust

Colorization Transformer
Manoj Kumar, Dirk Weissenborn, Nal Kalchbrenner

Control-Aware Representations for Model-based Reinforcement Learning
Brandon Cui, Yinlam Chow, Mohammad Ghavamzadeh

Evaluations and Methods for Explanation through Robustness Analysis
Cheng-Yu Hsieh, Chih-Kuan Yeh, Xuanqing Liu, Pradeep Kumar Ravikumar, Seungyeon Kim, Sanjiv Kumar, Cho-Jui Hsieh

Learning and Evaluating Representations for Deep One-Class Classification
Kihyuk Sohn, Chun-Liang Li, Jinsung Yoon, Minho Jin, Tomas Pfister

No MCMC for Me: Amortized Sampling for Fast and Stable Training of Energy-Based Models
Will Sussman Grathwohl, Jacob Jin Kelly, Milad Hashemi, Mohammad Norouzi, Kevin Swersky, David Duvenaud

Neural Thompson Sampling
Weitong ZHANG, Dongruo Zhou, Lihong Li, Quanquan Gu

A Design Space Study for LISTA and Beyond
Tianjian Meng, Xiaohan Chen, Yifan Jiang, Zhangyang Wang

i-Mix: A Domain-Agnostic Strategy for Contrastive Representation Learning
Kibok Lee, Yian Zhu, Kihyuk Sohn, Chun-Liang Li, Jinwoo Shin, Honglak Lee

Factorizing Declarative and Procedural Knowledge in Structured, Dynamical Environments
Anirudh Goyal, Alex Lamb, Phanideep Gampa, Philippe Beaudoin, Charles Blundell, Sergey Levine, Yoshua Bengio, Michael Curtis Mozer

Calibration of Neural Networks using Splines
Kartik Gupta, Amir Rahimi, Thalaiyasingam Ajanthan, Thomas Mensink, Cristian Sminchisescu, Richard Hartley

Extreme Memorization via Scale of Initialization
Harsh Mehta, Ashok Cutkosky, Behnam Neyshabur

Molecule Optimization by Explainable Evolution
Binghong Chen, Tianzhe Wang, Chengtao Li, Hanjun Dai, Le Song

Combining Ensembles and Data Augmentation Can Harm Your Calibration
Yeming Wen, Ghassen Jerfel, Rafael Muller, Michael W Dusenberry, Jasper Snoek, Balaji Lakshminarayanan, Dustin Tran

Workshops
Science and Engineering of Deep Learning
Speakers and Panelists include: Alex Hanna
Moderator and Advisors include: Emily Denton
Organizers include: Negar Rostemzadeh, Samy Bengio*

Synthetic Data Generation: Quality, Privacy, Bias
Speakers include: Jinsung Yoon, Emily Denton
Program Committee includes: Syed Ashrafulla

Enormous Language Models: Perspectives and Benchmarks
Speakers and Panelists include: Noam Shazeer, Natalie Schluter
Organizers include: Colin Raffel, Adam Roberts, Jascha Sohl-Dickstein, Katherine Lee, William Fedus, Aitor Lewkowycz

The Role of Mathematical Reasoning in General Artificial Intelligence
Speakers and Panelists include: Markus Rabe, Christian Szegedy

Weakly Supervised Learning
Invited Speakers include: Lu Jiang

Learning to Learn
Organizers include: Yevgen Chebotar

Embodied Multimodal Learning (EML)
Invited Speakers includes: Sergey Levine

Distributed and Private Machine Learning
Program Committee includes: Peter Kairouz, Ananda Theertha Suresh

S2D-OLAD: From Shallow to Deep, Overcoming Limited and Adverse Data
Invited Speakers include: Alex Hanna, Hugo Larochelle
Organizers include: Vincent Dumoulin

Responsible AI (RAI)
Speakers include: Been Kim

Energy-Based Models: Current Perspectives, Challenges, and Opportunities
Organizers include: Adji Bousso Dieng, Igor Mordatch

A Roadmap to Never-Ending RL
Invited Session Panelists include: Aleksandra Faust
Program Committee includes: Coline Devin, Karol Hausman, Ben Eysenbach, Ofir Nachum, Ryan Julian, Tianhe Yu, Dumitru Erhan, Marc Pickett, Shixiang Gu

2nd Workshop on Practical ML for Developing Countries: Learning Under Limited/low Resource Scenarios
Program Committee includes: Pablo Samuel Castro

Beyond Static Papers: Rethinking How We Share Scientific Understanding in ML
Speakers include: David Ha, Hugo Larochelle
Organizers include: Sara Hooker

* Indicates work done while at Google