DataBloom - Part 402

Improving Vision Transformer Efficiency and Accuracy by Learning to Tokenize

Post author By
Post date December 7, 2021
No Comments on Improving Vision Transformer Efficiency and Accuracy by Learning to Tokenize

Posted by Michael Ryoo, Research Scientist, Robotics at Google and Anurag Arnab, Research Scientist, Google Research

Transformer models consistently obtain state-of-the-art results in computer vision tasks, including object detection and video classification. In contrast to standard convolutional approaches that process images pixel-by-pixel, the Vision Transformers (ViT) treat an image as a sequence of patch tokens (i.e., a smaller part, or “patch”, of an image made up of multiple pixels). This means that at every layer, a ViT model recombines and processes patch tokens based on relations between each pair of tokens, using multi-head self-attention. In doing so, ViT models have the capability to construct a global representation of the entire image.

At the input-level, the tokens are formed by uniformly splitting the image into multiple segments, e.g., splitting an image that is 512 by 512 pixels into patches that are 16 by 16 pixels. At the intermediate levels, the outputs from the previous layer become the tokens for the next layer. In the case of videos, video ‘tubelets’ such as 16x16x2 video segments (16×16 images over 2 frames) become tokens. The quality and quantity of the visual tokens decide the overall quality of the Vision Transformer.

The main challenge in many Vision Transformer architectures is that they often require too many tokens to obtain reasonable results. Even with 16×16 patch tokenization, for instance, a single 512×512 image corresponds to 1024 tokens. For videos with multiple frames, that results in tens of thousands of tokens needing to be processed at every layer. Considering that the Transformer computation increases quadratically with the number of tokens, this can often make Transformers intractable for larger images and longer videos. This leads to the question: is it really necessary to process that many tokens at every layer?

In “TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?”, an earlier version of which is presented at NeurIPS 2021, we show that adaptively generating a smaller number of tokens, rather than always relying on tokens formed by uniform splitting, enables Vision Transformers to run much faster and perform better. TokenLearner is a learnable module that takes an image-like tensor (i.e., input) and generates a small set of tokens. This module could be placed at various different locations within the model of interest, significantly reducing the number of tokens to be handled in all subsequent layers. The experiments demonstrate that having TokenLearner saves memory and computation by half or more without damaging classification performance, and because of its ability to adapt to inputs, it even increases the accuracy.

The TokenLearner
We implement TokenLearner using a straightforward spatial attention approach. In order to generate each learned token, we compute a spatial attention map highlighting regions-of-importance (using convolutional layers or MLPs). Such a spatial attention map is then applied to the input to weight each region differently (and discard unnecessary regions), and the result is spatially pooled to generate the final learned tokens. This is repeated multiple times in parallel, resulting in a few (~10) tokens out of the original input. This can also be viewed as performing a soft-selection of the pixels based on the weight values, followed by global average pooling. Note that the functions to compute the attention maps are governed by different sets of learnable parameters, and are trained in an end-to-end fashion. This allows the attention functions to be optimized in capturing different spatial information in the input. The figure below illustrates the process.

The TokenLearner module learns to generate a spatial attention map for each output token, and uses it to abstract the input to tokenize. In practice, multiple spatial attention functions are learned, are applied to the input, and generate different token vectors in parallel.

As a result, instead of processing fixed, uniformly tokenized inputs, TokenLearner enables models to process a smaller number of tokens that are relevant to the specific recognition task. That is, (1) we enable adaptive tokenization so that the tokens can be dynamically selected conditioned on the input, and (2) this effectively reduces the total number of tokens, greatly reducing the computation performed by the network. These dynamically and adaptively generated tokens can be used in standard transformer architectures such as ViT for images and ViViT for videos.

Where to Place TokenLearner
After building the TokenLearner module, we had to determine where to place it. We first tried placing it at different locations within the standard ViT architecture with 224×224 images. The number of tokens TokenLearner generated was 8 and 16, much less than 196 or 576 tokens the standard ViTs use. The below figure shows ImageNet few-shot classification accuracies and FLOPS of the models with TokenLearner inserted at various relative locations within ViT B/16, which is the base model with 12 attention layers operating on 16×16 patch tokens.

Top: ImageNet 5-shot transfer accuracy with JFT 300M pre-training, with respect to the relative TokenLearner locations within ViT B/16. Location 0 means TokenLearner is placed before any Transformer layer. Base is the original ViT B/16. Bottom: Computation, measured in terms of billions of floating point operations (GFLOPS), per relative TokenLearner location.

We found that inserting TokenLearner after the initial quarter of the network (at 1/4) achieves almost identical accuracies as the baseline, while reducing the computation to less than a third of the baseline. In addition, placing TokenLearner at the later layer (after 3/4 of the network) achieves even better performance compared to not using TokenLearner while performing faster, thanks to its adaptiveness. Due to the large difference between the number of tokens before and after TokenLearner (e.g., 196 before and 8 after), the relative computation of the transformers after the TokenLearner module becomes almost negligible.

Comparing Against ViTs
We compared the standard ViT models with TokenLearner against those without it while following the same setting on ImageNet few-shot transfer. TokenLearner was placed in the middle of each ViT model at various locations such as at 1/2 and at 3/4. The below figure shows the performance/computation trade-off of the models with and without TokenLearner.

Performance of various versions of ViT models with and without TokenLearner, on ImageNet classification. The models were pre-trained with JFT 300M. The closer a model is to the top-left of each graph the better, meaning that it runs faster and performs better. Observe how TokenLearner models perform better than ViT in terms of both accuracy and computation.

We also inserted TokenLearner within larger ViT models, and compared them against the giant ViT G/14 model. Here, we applied TokenLearner to ViT L/10 and L/8, which are the ViT models with 24 attention layers taking 10×10 (or 8×8) patches as initial tokens. The below figure shows that despite using many fewer parameters and less computation, TokenLearner performs comparably to the giant G/14 model with 48 layers.

Left: Classification accuracy of large-scale TokenLearner models compared to ViT G/14 on ImageNet datasets. Right: Comparison of the number of parameters and FLOPS.

High-Performing Video Models
Video understanding is one of the key challenges in computer vision, so we evaluated TokenLearner on multiple video classification datasets. This was done by adding TokenLearner into Video Vision Transformers (ViViT), which can be thought of as a spatio-temporal version of ViT. TokenLearner learned 8 (or 16) tokens per timestep.

When combined with ViViT, TokenLearner obtains state-of-the-art (SOTA) performance on multiple popular video benchmarks, including Kinetics-400, Kinetics-600, Charades, and AViD, outperforming the previous Transformer models on Kinetics-400 and Kinetics-600 as well as previous CNN models on Charades and AViD.

Models with TokenLearner outperform state-of-the-art on popular video benchmarks (captured from Nov. 2021). Left: popular video classification tasks. Right: comparison to ViViT models.

Visualization of the spatial attention maps in TokenLearner, over time. As the person is moving in the scene, TokenLearner pays attention to different spatial locations to tokenize.

Conclusion
While Vision Transformers serve as powerful models for computer vision, a large number of tokens and their associated computation amount have been a bottleneck for their application to larger images and longer videos. In this project, we illustrate that retaining such a large number of tokens and fully processing them over the entire set of layers is not necessary. Further, we demonstrate that by learning a module that extracts tokens adaptively based on the input image allows attaining even better performance while saving compute. The proposed TokenLearner was particularly effective in video representation learning tasks, which we confirmed with multiple public datasets. A preprint of our work as well as code are publicly available.

Acknowledgement
We thank our co-authors: AJ Piergiovanni, Mostafa Dehghani, and Anelia Angelova. We also thank the Robotics at Google team members for the motivating discussions.

Misc

Multi-worker training with Keras

Post author By
Post date December 7, 2021
No Comments on Multi-worker training with Keras

when I’m using TensorFlow multi worker strategy does every node have to have every file or do I give certain files to each node?

submitted by /u/Mayfieldmobster
[visit reddit] [comments]

Misc

How to write binary classification with Tensorflow 1.x?

Post author By
Post date December 7, 2021
No Comments on How to write binary classification with Tensorflow 1.x?

I’m at my wits end, every single tutorial for classification I find for Tensorflow 1.x is a goddamn MNIST tutorial, they always skip the basics, and I need the basics.

I have a test set with 6 numerical features and a label that is binary 1 or 0.

With Tensorflow 2.0 I can easily just use something like this

model = tf.keras.Sequential([ tf.keras.layers.Dense(21, activation="tanh"), tf.keras.layers.Dense(10, activation="sigmoid"), tf.keras.layers.Dense(1, activation="sigmoid") ]) model.compile( loss=tf.keras.losses.binary_crossentropy, optimizer=tf.keras.optimizers.Adam(0.001), metrics=['accuracy'] ) history = model.fit(X_train, y_train, epochs=25)

For the life of me I don’t know how to go about doing this in Tensorflow 1.x.

submitted by /u/Practicing1s
[visit reddit] [comments]

Misc

NVIDIA Research Presenting 20 Papers at NeurIPS 2021

Post author By
Post date December 6, 2021
No Comments on NVIDIA Research Presenting 20 Papers at NeurIPS 2021

At the forefront of AI innovation, NVIDIA continues to push the boundaries of technology in machine learning, self-driving cars, robotics, graphics, and more.

At the forefront of AI innovation, NVIDIA continues to push the boundaries of technology in machine learning, self-driving cars, robotics, graphics, and more. NVIDIA researchers will present 20 papers at the thirty-fifth annual conference on Neural Information Processing Systems (NeurIPS) from December 6 to December 14, 2021.

Here are some of the featured papers:

Alias-Free Generative Adversarial Networks (StyleGAN3)
Tero Karras, Miika Aittala, Samuli Laine, Erik Härkönen, Janne Hellsten, Jaakko Lehtinen, Timo Aila | Paper | GitHub | Blog

StyleGAN3, a model developed by NVIDIA Research, will be presented on Tuesday, December 7 from 12:40 AM – 12:55 AM PST, advances the state-of-the-art in generative adversarial networks used to synthesize realistic images. The breakthrough brings graphics principles in signal processing and image processing to GANs to avoid aliasing: a kind of image corruption often visible when images are rotated, scaled or translated.

Video 1. Results from the StyleGAN3 model

EditGAN: High-Precision Semantic Image Editing
Huan Ling*, Karsten Kreis*, Daiqing Li, Seung Wook Kim, Antonio Torralba, Sanja Fidler | Paper | GitHub

EditGAN, a novel method for high quality, high precision semantic image editing, allowing users to edit images by modifying their highly detailed part segmentation masks, e.g., drawing a new mask for the headlight of a car. EditGAN builds on a GAN framework that jointly models images and their semantic segmentations, requiring only a handful of labeled examples, making it a scalable tool for editing. The poster session will be held on Thursday, December 9 from 8:30 AM – 10:00 AM PST.

Video 2. The video showcases EditGAN in an interactive demo tool.

SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers
Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, Ping Luo | Paper | GitHub

SegFormer, a simple, efficient yet powerful semantic segmentation framework which unifies Transformers with lightweight multilayer perception (MLP) decoders. SegFormer has two appealing features: 1) SegFormer comprises a novel hierarchically structured Transformer encoder which outputs multiscale features. It does not need positional encoding, thereby avoiding the interpolation of positional codes which leads to decreased performance when the testing resolution differs from training. 2) SegFormer avoids complex decoders. The poster will be presented on Tuesday, December 7 from 8:30 AM – 10:00 AM PST.

Video 3. The video shows the excellent zero-shot robustness of SegFormer on the Cityscapes-C dataset.

DIB-R++: Learning to Predict Lighting and Material with a Hybrid Differentiable Renderer
Wenzheng Chen, Joey Litalien, Jun Gao, Zian Wang, Clement Fuji Tsang, Sameh Khamis, Or Litany, Sanja Fidler | Paper

DIB-R++, a deferred, image-based renderer which supports these photorealistic effects by combining rasterization and ray-tracing, taking advantage of their respective strengths—speed and realism. The poster session is on Thursday, December 9 from 4:30 PM – 6:00 PM PST.

DIB-R++ is a deferred, image-based renderer to predict lighting and material. — Image 1. DIB-R++ is a hybrid renderer that combines rasterization and ray tracing together. Given a 3D mesh M, we employ (a) a rasterization-based renderer to obtain diffuse albedo, surface normals and mask maps. In the shading pass (b), we then use these buffers to compute the incident radiance by sampling or by representing lighting and the specular BRDF using a spherical Gaussian basis. Depending on the representation used in (c), we can render with advanced lighting and material effect (d).

In addition to the papers at NeurIPS 2021, researchers and developers can accelerate 3D deep learning research with new Kaolin features:

Kaolin is launching new features to accelerate 3D deep learning research. Updates to the NVIDIA Omniverse Kaolin app will bring robust visualization of massive point clouds. Updates to the Kaolin library will include support for tetrahedral meshes, rays management functionality, and a strong speedup to DIB-R. To learn more about Kaolin, watch the recent GTC session.

To view the complete list of NVIDIA Research accepted papers, workshop and tutorials, demos, and to explore job opportunities at NVIDIA, visit the NVIDIA at NeurIPS 2021 website.

Misc

Inference time on cpu is high

Post author By
Post date December 6, 2021
No Comments on Inference time on cpu is high

hi, i tried to load model on cpu with tf.device while inference of 500 images , the cpu usage resches to 100% , inference time is 0.6sec and how do I minimize the inference time and also the utilization of cpu .

submitted by /u/nanitiru18
[visit reddit] [comments]

Misc

What are the nvidia NGC container optimizations for mixed precision?

Post author By
Post date December 6, 2021
No Comments on What are the nvidia NGC container optimizations for mixed precision?

I am trying to increase the training speed of my model by using mixed precision and the nvidia gpu tensor cores. For this, I just use the keras mixed precision, but the speed increment is only of 10%. Then I found the nividia ngc container, which is optimized for their gpus, and with mixed precision I can increase the training speed a 60%, although with float32 the speed in lower than native. I would like to have at least the speed increase of ngc container natively, what do I need to do?

submitted by /u/diepala
[visit reddit] [comments]

Misc

Building a Foundation for Zero Trust Security with NVIDIA DOCA 1.2

Post author By
Post date December 6, 2021
No Comments on Building a Foundation for Zero Trust Security with NVIDIA DOCA 1.2

Dive deep into the new features and use cases available for networking, security, storage in the latest release of the DOCA software framework.

Today, NVIDIA released the NVIDIA DOCA 1.2 software framework for NVIDIA BlueField DPUs, the world’s most advanced data processing unit (DPU). Designed to enable the NVIDIA BlueField ecosystem and developer community, DOCA is the key to unlocking the potential of the DPU by offering services to offload, accelerate, and isolate infrastructure applications services from the CPU.

DOCA is a software framework that brings together APIs, drivers, libraries, sample code, documentation, services, and prepackaged containers to simplify and speed up application development and deployment on BlueField DPUs on every data center node. Together, DOCA and BlueField create an isolated and secure services domain for networking, security, storage, and infrastructure management that is ideal for enabling a zero-trust strategy.

The DOCA 1.2 release introduces several important features and use cases.

Protect host services with adaptive cloud security

A modern approach to security based on zero trust principles is critical to securing today’s data centers, as resources inside the data center can no longer be trusted automatically. App Shield enables detection of attacks on critical services in a system. In many systems, those critical services are responsible for ensuring the integrity and privacy of the execution of many applications.

DOCA App Shield tech diagram showing the steps from host to AI Driven intrusion detection. — *Figure 1. Shield your host services with adaptive cloud security*

DOCA App Shield provides host monitoring enabling cybersecurity vendors to create accelerated intrusion detection system (IDS) solutions to identify an attack on any physical or virtual machine. It can feed data about application status to security information and event management (SIEM) or extended detection and response (XDR) tools and also enhances forensic investigations.

If a host is compromised, attackers normally exploit the security control mechanism breaches to move laterally across data center networks to other servers and devices. App Shield enables security teams to shield their application processes, continuously validate their integrity, and in turn detect malicious activity.

In the event that an attacker kills the machine security agent’s processes, App Shield can mitigate the attack by isolating the compromised host, preventing the malware from accessing confidential data or spreading to other resources. App Shield is an important advancement in the fight against cybercrime and an effective tool to enable a zero-trust security stance.

BlueField DPUs and the DOCA software framework provide an open foundation for partners and developers to build zero-trust solutions and address the security needs of the modern data center. Together, DOCA and BlueField create an isolated and secure services domain for networking, security, storage, and infrastructure management that is ideal for enabling a zero-trust strategy.

Create time-synchronized data centers

Precision timing is a critical capability to enable and accelerate distributed apps from edge to core. DOCA Firefly is a data center timing service that supports extremely precise time synchronization everywhere. With nanosecond-level clock synchronization, you can enable a new broad range of timing-critical and delay-sensitive applications.

DOCA Firefly tech stack diagram includes services, tools, LIBs and drivers which support a wide range of use cases. — *Figure 2. Precision time-synchronized data center service*

DOCA Firefly addresses a wide range of use cases, including the following:

High-frequency trading
Distributed databases
Industrial 5G radio access networks (RAN)
Scientific research
High performance computing (HPC)
Omniverse digital twins
Gaming
AR/VR
Autonomous vehicles
Security

It enables data consistency, accurate event ordering, and causality analysis, such as ensuring the correct sequencing of stock market transactions and fair bidding during digital auctions. The hardware engines in the BlueField application-specific integrated circuit (ASIC) are capable of time-stamping data packets at full wire speed with breakthrough nanosecond-level accuracy.

Improving the accuracy of data center timing by orders of magnitude offers many advantages.

With globally synchronized data centers, you can accelerate distributed applications and data analysis including AI, HPC, professional media production, telco virtual network functions, and precise event monitoring. All the servers in the data center—or across data centers—can be harmonized to provide something that is far bigger than any single compute node.

The benefits of improving data center timing accuracy include a reduction in the amount of compute power and network traffic needed to replicate and validate the data. For example, Firefly synchronization delivers a 3x database performance gain to distributed databases.

DOCA HBN beta

The BlueField DPU is a unique solution for network acceleration and policy enforcement within an endpoint host. At the same time, BlueField provides an administrative and software demarcation between the host operating system and functions running on the DPU.

With DOCA host-based networking (HBN), top-of-rack (TOR) network configuration can extend down to the DPU, enabling network administrators to own DPU configuration and management while application management can be handled separately by x86 host administrators. This creates an unparalleled opportunity to reimagine how you can build data center networks.

DOCA 1.2 provides a new driver for HBN called Netlink to DOCA (nl2doca) that accelerates and offloads traditional Linux Netlink messages. nl2doca is provided as an acceleration driver integrated as part of the HBN service container. You can now accelerate host networking for L2 and L3 that relies on DPDK, OVS, or now kernel routing with Netlink.

NVIDIA is adding support for the open-source Free Range Routing (FRR) project, running on the DPU and leveraging this new nl2doca driver. This support enables the DPU to operate exactly like a TOR switch plus additional benefits. FRR on the DPU enables EVPN networks to move directly into the host, providing layer 2 (VLAN) extension and layer 3 (VRF) tenant isolation.

HBN on the DPU can manage and monitor traffic between VMs or containers on the same node. It can also analyze and encrypt or decrypt then analyze traffic to and from the node, both tasks that no ToR switch can perform. You can build your own Amazon VPC-like solution in your private cloud for containerized, virtual machine, and bare metal workloads.

HBN with BlueField DPUs revolutionizes how you build data center networks. It offers the following benefits:

Plug-and-play servers: Leveraging FRR’s BGP unnumbered, servers can be directly connected to the network with no need to coordinate server-to-switch configurations. No need for MLAG, bonding, or NIC teaming.
Open, interoperable multi-tenancy: EVPN enables server-to-server or server-to-switch overlays. This provides multi-tenant solutions for bare metal, closed appliances, or any hypervisor solution, regardless of the underlay networking vendor. EVPN provides distributed overlay configuration, while eliminating the need for costly, proprietary, centralized SDN controllers.
Secure network management: The BlueField DPU provides an isolated environment for network policy configuration and enforcement. There are no software or dependencies on the host.
Enabling advanced HCI and storage networking: BlueField provides a simple method for HCI and storage partners to solve current network challenges for multi-tenant and hybrid cloud solutions, regardless of the hypervisor.
Flexible network offloading: The nl2doca driver provided by HBN enables any netlink capable application to offload and accelerate kernel based networking without the complexities of traditional DPDK libraries.
Simplification of TOR switch requirements: More intelligence is placed on the DPU within the server, reducing the complexity of the TOR switch.

Additional DOCA 1.2 SDK updates:

DOCA FLOW – Firewall (Alpha)
DOCA FLOW – Gateway (Beta)
DOCA FLOW remote APIs
DOCA 1.2 includes enhancements and scale for IPsec and TLS

DLI course: Introduction to DOCA for the BlueField DPU

In addition, NVIDIA is introducing a Deep Learning Institute (DLI) course: Introduction to DOCA for the BlueField DPU. The main objective of this course is to provide students, including developers, researchers, and system administrators, with an introduction to DOCA and BlueField DPUs. This enables students to successfully work with DOCA to create accelerated applications and services powered by BlueField DPUs.

Try DOCA today

You can experience DOCA today with the DOCA software, which includes DOCA SDK and runtime accelerated libraries for networking, storage, and security. The libraries help you program your data center infrastructure running on the DPU.

The DOCA Early Access program is open now for applications. To receive news and updates about DOCA or to become an early access member/partner, register on the DOCA Early Access page.

For more information, see the following resources:

Offsites

Google at NeurIPS 2021

Posted by Jaqui Herman and Cat Armato, Program Managers

This week marks the beginning of the 35^th annual Conference on Neural Information Processing Systems (NeurIPS 2021), the biggest machine learning conference of the year. NeurIPS 2021 will be held virtually and includes invited talks, demonstrations and presentations of some of the latest in machine learning research. This year, NeurIPS also announced a new Datasets and Benchmarks track, which will include publications, talks, posters, and discussions related to this research area.

Google will have a strong presence with more than 170 accepted papers, additionally contributing to and learning from the broader academic research community via talks, posters, workshops, and tutorials. You can learn more about our work being presented in the list below (Google affiliations highlighted in bold).

Organizing Committee

Communications Co-Chair: Emily Denton
Program Co-Chair: Yann Dauphin
Workshop Co-Chair: Sanmi Koyejo

Senior Area Chairs: Alekh Agarwal, Amir Globerson, Been Kim, Charles Sutton, Claudio Gentile, Corinna Cortes, Dale Schuurmans, David Duvenaud, Elad Hazan, Hugo Larochelle, Jean-Philippe Vert, Kevin Murphy, Marco Cuturi, Mehryar Mohri, Mohammad Ghavamzadeh, Samory Kpotufe, Sanjiv Kumar, Satyen Kale, Sergey Levine, Tara N. Sainath, Yishay Mansour

Area Chairs: Abhishek Kumar, Abhradeep Guha Thakurta, Alex Kulesza, Alexander A. Alemi, Alexander T. Toshev, Amin Karbasi, Amit Daniely, Ananda Theertha Suresh, Ankit Singh Rawat, Ashok Cutkosky, Badih Ghazi, Balaji Lakshminarayanan, Ben Poole, Bo Dai, Boqing Gong, Chelsea Finn, Chiyuan Zhang, Christian Szegedy, Cordelia Schmid, Craig Boutilier, Cyrus Rashtchian, D. Sculley, Daniel Keysers, David Ha, Denny Zhou, Dilip Krishnan, Dumitru Erhan, Dustin Tran, Ekin Dogus Cubuk, Fabian Pedregosa, George Tucker, Hanie Sedghi, Hanjun Dai, Heinrich Jiang, Hossein Mobahi, Izhak Shafran, Jaehoon Lee, Jascha Sohl-Dickstein, Jasper Snoek, Jeffrey Pennington, Jelani Nelson, Jieming Mao, Justin Gilmer, Karol Hausman, Karthik Sridharan, Kevin Swersky, Maithra Raghu, Mario Lucic, Mathieu Blondel, Matt Kusner, Matthew Johnson, Matthieu Geist, Ming-Hsuan Yang, Mohammad Mahdian, Mohammad Norouzi, Nal Kalchbrenner, Naman Agarwal, Nicholas Carlini, Nicolas Papernot, Olivier Bachem, Olivier Pietquin, Paul Duetting, Praneeth Netrapalli, Pranjal Awasthi, Prateek Jain, Quentin Berthet, Renato Paes Leme, Richard Nock, Rif A. Saurous, Rose Yu, Roy Frostig, Samuel Stern Schoenholz, Sashank J. Reddi, Sercan O. Arik, Sergei Vassilvitskii, Sergey Ioffe, Shay Moran, Silvio Lattanzi, Simon Kornblith, Srinadh Bhojanapalli, Thang Luong, Thomas Steinke, Tim Salimans, Tomas Pfister, Tomer Koren, Uri Stemmer, Vahab Mirrokni, Vikas Sindhwani, Vincent Dumoulin, Virginia Smith, Vladimir Braverman, W. Ronny Huang, Wen Sun, Yang Li, Yasin Abbasi-Yadkori, Yinlam Chow,Yujia Li, Yunhe Wang, Zoltán Szabó

NeurIPS Foundation Board 2021: Michael Mozer, Corinna Cortes, Hugo Larochelle, John C. Platt, Fernando Pereira

Test of Time Award

Online Learning for Latent Dirichlet Allocation
Matthew D. Hoffman^†, David M. Blei, Francis Bach

Publications

Deep Reinforcement Learning at the Edge of the Statistical Precipice (see blog post)
Outstanding Paper Award Recipient
Rishabh Agarwal, Max Schwarzer, Pablo Samuel Castro, Aaron Courville, Marc G. Bellemare

A Separation Result Between Data-Oblivious and Data-Aware Poisoning Attacks
Samuel Deng, Sanjam Garg, Somesh Jha, Saeed Mahloujifar, Mohammad Mahmoody, Abhradeep Guha Thakurta

Adversarial Robustness of Streaming Algorithms Through Importance Sampling
Vladimir Braverman, Avinatan Hassidim, Yossi Matias, Mariano Schain, Sandeep Silwal, Samson Zhou

Aligning Silhouette Topology for Self-Adaptive 3D Human Pose Recovery
Mugallodi Rakesh, Jogendra Nath Kundu, Varun Jampani, R. Venkatesh Babu

Attention Bottlenecks for Multimodal Fusion
Arsha Nagrani, Shan Yang, Anurag Arnab, Aren Jansen, Cordelia Schmid, Chen Sun

Autonomous Reinforcement Learning via Subgoal Curricula
Archit Sharma, Abhishek Gupta, Sergey Levine, Karol Hausman, Chelsea Finn

Calibration and Consistency of Adversarial Surrogate Losses
Pranjal Awasthi, Natalie S. Frank, Anqi Mao, Mehryar Mohri, Yutao Zhong

Compressive Visual Representations
Kuang-Huei Lee, Anurag Arnab, Sergio Guadarrama, John Canny, Ian Fischer

Counterfactual Invariance to Spurious Correlations in Text Classification
Victor Veitch, Alexander D’Amour, Steve Yadlowsky, Jacob Eisenstein

Deep Learning Through the Lens of Example Difficulty
Robert J.N. Baldock, Hartmut Maennel, Behnam Neyshabur

Deep Neural Networks as Point Estimates for Deep Gaussian Processes
Vinent Dutordoir, James Hensman, Mark van der Wilk, Carl Henrik Ek, Zoubin Ghahramani, Nicolas Durrande

Delayed Gradient Averaging: Tolerate the Communication Latency for Federated Learning
Ligeng Zhu, Hongzhou Lin, Yao Lu, Yujun Lin, Song Han

Discrete-Valued Neural Communication
Dianbo Liu, Alex Lamb, Kenji Kawaguchi, Anirudh Goyal, Chen Sun, Michael Curtis Mozer, Yoshua Bengio

Do Vision Transformers See Like Convolutional Neural Networks?
Maithra Raghu, Thomas Unterthiner, Simon Kornblith, Chiyuan Zhang, Alexey Dosovitskiy

Dueling Bandits with Team Comparisons
Lee Cohen, Ulrike Schmidt-Kraepelin, Yishay Mansour

End-to-End Multi-Modal Video Temporal Grounding
Yi-Wen Chen, Yi-Hsuan Tsai, Ming-Hsuan Yang

Environment Generation for Zero-Shot Compositional Reinforcement Learning
Izzeddin Gur, Natasha Jaques, Yingjie Miao, Jongwook Choi, Manoj Tiwari, Honglak Lee, Aleksandra Faust

H-NeRF: Neural Radiance Fields for Rendering and Temporal Reconstruction of Humans in Motion
Hongyi Xu, Thiemo Alldieck, Cristian Sminchisescu

Improving Calibration Through the Relationship with Adversarial Robustness
Yao Qin, Xuezhl Wang, Alex Beutel, Ed Chi

Learning Generalized Gumbel-Max Causal Mechanisms
Guy Lorberbom, Daniel D. Johnson, Chris J. Maddison, Daniel Tarlow, Tamir Hazan

MICo: Improved Representations via Sampling-Based State Similarity for Markov Decision Processes
Pablo Samuel Castro, Tyler Kastner, Prakash Panangaden, Mark Rowland

Near-Optimal Lower Bounds For Convex Optimization For All Orders of Smoothness
Ankit Garg, Robin Kothari, Praneeth Netrapalli, Suhail Sherif

Neural Circuit Synthesis from Specification Patterns
Frederik Schmitt, Christopher Hahn, Markus N. Rabe, Bernd Finkbeiner

Non-Local Latent Relation Distillation for Self-Adaptive 3D Human Pose Estimation
Jogendra Nath Kundu, Siddharth Seth, Anirudh Jamkhandi, Pradyumna YM, Varun Jampani, Anirban Chakraborty, R. Venkatesh Babu

Object-Aware Contrastive Learning for Debiased Scene Representation
Sangwoo Mo, Hyunwoo Kang, Kihyuk Soh, Chun-Liang Li, Jinwoo Shin

On Density Estimation with Diffusion Models
Diederik P. Kingma, Tim Salimans, Ben Poole, Jonathan Ho

On Margin-Based Cluster Recovery with Oracle Queries
Marco Bressan, Nicolo Cesa-Bianchi, Silvio Lattanzi, Andrea Paudice

On Model Calibration for Long-Tailed Object Detection and Instance Segmentation
Tai-Yu Pan, Cheng Zhang, Yandong Li, Hexiang Hu, Dong Xuan, Soravit Changpinyo, Boqing Gong, Wei-Lun Chao

Parallelizing Thompson Sampling
Amin Karbasi, Vahab Mirrokni, Mohammad Shadravan

Reverse-Complement Equivariant Networks for DNA Sequences
Vincent Mallet, Jean-Philippe Vert

Revisiting ResNets: Improved Training and Scaling Strategies
Irwan Bello, William Fedus, Xianzhi Du, Ekin Dogus Cubuk, Aravind Srinivas, Tsung-Yi Lin, Jonathon Shlens, Barret Zoph

Revisiting the Calibration of Modern Neural Networks
Matthias Minderer, Josip Djolonga, Rob Romijnders, Frances Ann Hubis, Xiaohua Zhai, Neil Houlsby, Dustin Tran, Mario Lucic

Scaling Vision with Sparse Mixture of Experts
Carlos Riquelme, Joan Puigcerver, Basil Mustafa, Maxim Neumann, Rodolphe Jenatton, André Susano Pinto, Daniel Keysers, Neil Houlsby

SE(3)-Equivariant Prediction of Molecular Wavefunctions and Electronic Densities
Oliver Thorsten Unke, Mihail Bogojeski, Michael Gastegger, Mario Geiger, Tess Smidt, Klaus Robert Muller

Stateful ODE-Nets Using Basis Function Expansions
Alejandro Francisco Queiruga, N. Benjamin Erichson, Liam Hodgkinson, Michael W. Mahoney

Statistically and Computationally Efficient Linear Meta-Representation Learning
Kiran Koshy Thekumparampil, Prateek Jain, Praneeth Netrapalli, Sewoong Oh

Streaming Belief Propagation for Community Detection
Yuchen Wu, Jakab Tardos, Mohammad Hossein Bateni, André Linhares, Filipe Miguel Gonçalves de Almeida, Andrea Montanari, Ashkan Norouzi-Fard

Synthetic Design: An Optimization Approach to Experimental Design with Synthetic Controls
Nick Doudchenko, Khashayar Khosravi, Jean Pouget-Abadie, Sebastien Lahaie, Miles Lubin, Vahab Mirrokni, Jann Spiess, Guido Imbens

The Difficulty of Passive Learning in Deep Reinforcement Learning
George Ostrovski, Pablo Samuel Castro, Will Dabney

The Pareto Frontier of Model Selection for General Contextual Bandits
Teodor Marinov, Julian Zimmert

VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text
Hassan Akbari, Liangzhe Yuan, Rui Qian, Wei-Hong Chuang, Shih-Fu Chang, Yin Cui, Boqing Gong

Co-Adaptation of Algorithmic and Implementational Innovations in Inference-Based Deep Reinforcement Learning
Hiroki Furuta, Tadashi Kozuno, Tatsuya Matsushima, Yutaka Matsuo, Shixiang Gu

Conservative Data Sharing for Multi-Task Offline Reinforcement Learning
Tianhe Yu, Aviral Kumar, Yevgen Chebotar, Karol Hausman, Sergey Levine, Chelsea Finn

Does Knowledge Distillation Really Work?
Samuel Stanton, Pavel Izmailov, Polina Kirichenko, Alexander A. Alemi, Andrew Gordon Wilson

Exponential Graph is Provably Efficient for Decentralized Deep Training
Bicheng Ying, Kun Yuan, Yiming Chen, Hanbin Hu, Pan Pan, Wotao Yin

Faster Matchings via Learned Duals
Michael Dinitz, Sungjin Im, Thomas Lavastida, Benjamin Moseley, Sergei Vassilvitskii

Improved Transformer for High-Resolution GANs
Long Zhao, Zizhao Zhang, Ting Chen, Dimitris N. Metaxas, Han Zhang

Near-Optimal Offline and Streaming Algorithms for Learning Non-Linear Dynamical Systems
Prateek Jain, Suhas S. Kowshik, Dheeraj Mysore Nagaraj, Praneeth Netrapalli

Nearly Horizon-Free Offline Reinforcement Learning
Tongzheng Ren, Jialian Li, Bo Dai, Simon S. Du, Sujay Sanghavi

Overparameterization Improves Robustness to Covariate Shift in High Dimensions
Nilesh Tripuraneni, Ben Adlam, Jeffrey Pennington

Pay Attention to MLPs
Hanxiao Liu, Zihang Dai, David R. So, Quoc V. Le

PLUR: A Unifying, Graph-Based View of Program Learning, Understanding, and Repair
Zimin Chen^*, Vincent Josua Hellendoorn^*, Pascal Lamblin, Petros Maniatis, Pierre-Antoine Manzagol, Daniel Tarlow, Subhodeep Moitra

Prior-Independent Dynamic Auctions for a Value-Maximizing Buyer
Yuan Deng, Hanrui Zhang

Remember What You Want to Forget: Algorithms for Machine Unlearning
Ayush Sekhari, Jayadev Acharya, Gautam Kamath, Ananda Theertha Suresh

Reverse Engineering Learned Optimizers Reveals Known and Novel Mechanisms
Niru Maheswaranathan^*, David Sussillo^*, Luke Metz, Ruoxi Sun, Jascha Sohl-Dickstein

Revisiting 3D Object Detection From an Egocentric Perspective
Boyang Deng, Charles R. Qi, Mahyar Najibi, Thomas Funkhouser, Yin Zhou, Dragomir Anguelov

Robust Auction Design in the Auto-Bidding World
Santiago Balseiro, Yuan Deng, Jieming Mao, Vahab Mirrokni, Song Zuo

Shift-Robust GNNs: Overcoming the Limitations of Localized Graph Training Data
Qi Zhu, Natalia Ponomareva, Jiawei Han, Bryan Perozzi

Understanding How Encoder-Decoder Architectures Attend
Kyle Aitken, Vinay V. Ramasesh, Yuan Cao, Niru Maheswaranathan

Understanding the Effect of Stochasticity in Policy Optimization
Jincheng Mei, Bo Dai, Chenjun Xiao, Csaba Szepesvari, Dale Schuurmans

Accurately Solving Rod Dynamics with Graph Learning
Han Shao, Tassilo Kugelstadt, Torsten Hädrich, Wojtek Palubicki, Jan Bender, Sören Pirk, Dominik L. Michels

GradInit: Learning to Initialize Neural Networks for Stable and Efficient Training
Chen Zhu, Renkun Ni, Zheng Xu, Kezhi Kong, W. Ronny Huang, Tom Goldstein

Learnability of Linear Thresholds from Label Proportions
Rishi Saket

MLP-Mixer: An All-MLP Architecture for Vision
Ilya Tolstikhin, Neil Houlsby, Alexander Kolesnikov, Lucas Beyer, Xiaohua Zhai, Thomas Unterthiner, Jessica Yung, Andreas Steiner, Daniel Keysers, Jakob Uszkoreit, Mario Lucic, Alexey Dosovitskiy

Neural Additive Models: Interpretable Machine Learning with Neural Nets
Rishabh Agarwal, Levi Melnick, Nicholas Frosst, Xuezhou Zhang, Ben Lengerich, Rich Caruana, Geoffrey Hinton

Neural Production Systems
Anirudh Goyal, Aniket Didolkar, Nan Rosemary Ke, Charles Blundell, Philippe Beaudoin, Nicolas Heess, Michael Mozer, Yoshua Bengio

Physics-Aware Downsampling with Deep Learning for Scalable Flood Modeling
Niv Giladi, Zvika Ben-Haim, Sella Nevo, Yossi Matias, Daniel Soudry

Shape from Blur: Recovering Textured 3D Shape and Motion of Fast Moving Objects
Denys Rozumnyi, Martin R. Oswald, Vittorio Ferrari, Marc Pollefeys

What Matters for Adversarial Imitation Learning?
Manu Orsini, Anton Raichuk, Léonard Hussenot, Damien Vincent, Robert Dadashi, Sertan Girgin, Matthieu Geist, Olivier Bachem, Olivier Pietquin, Marcin Andrychowicz

A Convergence Analysis of Gradient Descent on Graph Neural Networks
Pranjal Awasthi, Abhimanyu Das, Sreenivas Gollapudi

A Geometric Analysis of Neural Collapse with Unconstrained Features
Zhihui Zhu, Tianyu Ding, Jinxin Zhou, Xiao Li, Chong You, Jeremias Sulam, Qing Qu

Agnostic Reinforcement Learning with Low-Rank MDPs and Rich Observations
Christoph Dann, Yishay Mansour, Mehryar Mohri, Ayush Sekhari, Karthik Sridharan

Controlled Text Generation as Continuous Optimization with Multiple Constraints
Sachin Kumar, Eric Malmi, Aliaksei Severyn, Yulia Tsvetkov

Coupled Gradient Estimators for Discrete Latent Variables
Zhe Dong, Andriy Mnih, George Tucker

Detecting Errors and Estimating Accuracy on Unlabeled Data with Self-Training Ensembles
Jiefeng Chen^*, Frederick Liu, Besim Avci, Xi Wu, Yingyu Liang, Somesh Jha

Neural Active Learning with Performance Guarantees
Zhilei Wang, Pranjal Awasthi, Christoph Dann, Ayush Sekhari, Claudio Gentile

Optimal Sketching for Trace Estimation
Shuli Jiang, Hai Pham, David Woodruff, Qiuyi (Richard) Zhang

Representing Long-Range Context for Graph Neural Networks with Global Attention
Zhanghao Wu, Paras Jain, Matthew A. Wright, Azalia Mirhoseini, Joseph E. Gonzalez, Ion Stoica

Scaling Up Exact Neural Network Compression by ReLU Stability
Thiago Serra, Xin Yu, Abhinav Kumar, Srikumar Ramalingam

Soft Calibration Objectives for Neural Networks
Archit Karandikar, Nicholas Cain, Dustin Tran, Balaji Lakshminarayanan, Jonathon Shlens, Michael Curtis Mozer, Rebecca Roelofs

Sub-Linear Memory: How to Make Performers SLiM
Valerii Likhosherstov, Krzysztof Choromanski, Jared Davis, Xingyou Song, Adrian Weller

A New Theoretical Framework for Fast and Accurate Online Decision-Making
Nicolò Cesa-Bianchi, Tommaso Cesari, Yishay Mansour, Vianney Perchet

Bridging the Gap Between Practice and PAC-Bayes Theory in Few-Shot Meta-Learning
Nan Ding, Xi Chen, Tomer Levinboim, Sebastian Goodman, Radu Soricut

Differentially Private Multi-Armed Bandits in the Shuffle Model
Jay Tenenbaum, Haim Kaplan, Yishay Mansour, Uri Stemmer

Efficient and Local Parallel Random Walks
Michael Kapralov, Silvio Lattanzi, Navid Nouri, Jakab Tardos

Improving Anytime Prediction with Parallel Cascaded Networks and a Temporal-Difference Loss
Michael Louis Iuzzolino, Michael Curtis Mozer, Samy Bengio^*

It Has Potential: Gradient-Driven Denoisers for Convergent Solutions to Inverse Problems
Regev Cohen, Yochai Blau, Daniel Freedman, Ehud Rivlin

Learning to Combine Per-Example Solutions for Neural Program Synthesis
Disha Shrivastava, Hugo Larochelle, Daniel Tarlow

LLC: Accurate, Multi-purpose Learnt Low-Dimensional Binary Codes
Aditya Kusupati, Matthew Wallingford, Vivek Ramanujan, Raghav Somani, Jae Sung Park, Krishna Pillutla, Prateek Jain, Sham Kakade, Ali Farhadi

There Is No Turning Back: A Self-Supervised Approach for Reversibility-Aware Reinforcement Learning (see blog post)

Nathan Grinsztajn, Johan Ferret, Olivier Pietquin, Philippe Preux, Matthieu Geist

A Near-Optimal Algorithm for Debiasing Trained Machine Learning Models
Ibrahim Alabdulmohsin, Mario Lucic

Adaptive Sampling for Minimax Fair Classification
Shubhanshu Shekhar, Greg Fields, Mohammad Ghavamzadeh, Tara Javidi

Asynchronous Stochastic Optimization Robust to Arbitrary Delays
Alon Cohen, Amit Daniely, Yoel Drori, Tomer Koren, Mariano Schain

Boosting with Multiple Sources
Corinna Cortes, Mehryar Mohri, Dmitry Storcheus, Ananda Theertha Suresh

Breaking the Centralized Barrier for Cross-Device Federated Learning
Sai Praneeth Karimireddy, Martin Jaggi, Satyen Kale, Mehryar Mohri, Sashank J. Reddi, Sebastian U. Stitch, Ananda Theertha Sureshi

Canonical Capsules: Self-Supervised Capsules in Canonical Pose
Weiwei Sun, Andrea Tagliasacchi, Boyang Deng, Sara Sabour, Soroosh Yazdani, Geoffrey Hinton, Kwang Moo Yi

Contextual Recommendations and Low-Regret Cutting-Plane Algorithms
Sreenivas Gollapudi, Guru Guruganesh, Kostas Kollias, Pasi Manurangsi, Renato Paes Leme, Jon Schneider

Decision Transformer: Reinforcement Learning via Sequence Modeling
Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee|Aditya Grover, Michael Laskin, Pieter Abbeel, Aravind Srinivas, Igor Mordatch

Deep Learning on a Data Diet: Finding Important Examples Early in Training
Mansheej Paul, Surya Ganguli, Gintare Karolina Dziugaite

Deep Learning with Label Differential Privacy
Badih Ghazi, Noah Golowich^*, Ravi Kumar, Pasin Manurangsi, Chiyuan Zhang

Efficient Training of Retrieval Models Using Negative Cache
Erik Lindgren, Sashank J. Reddi, Ruiqi Guo, Sanjiv Kumar

Exploring Cross-Video and Cross-Modality Signals for Weakly-Supervised Audio-Visual Video Parsing
Yan-Bo Lin, Hung-Yu Tseng, Hsin-Ying Lee, Yen-Yu Lin, Ming-Hsuan Yang

Federated Reconstruction: Partially Local Federated Learning
Karan Singhal, Hakim Sidahmed, Zachary Garrett, Shanshan Wu, Keith Rush, Sushant Prakash

Framing RNN as a Kernel Method: A Neural ODE Approach
Adeline Fermanian, Pierre Marion, Jean-Philippe Vert, Gérard Biau

Learning Semantic Representations to Verify Hardware Designs
Shobha Vasudevan, Wenjie Jiang, David Bieber, Rishabh Singh, Hamid Shojaei, C. Richard Ho, Charles Sutton

Learning with User-Level Privacy
Daniel Asher Nathan Levy^*, Ziteng Sun^*, Kareem Amin, Satyen Kale, Alex Kulesza, Mehryar Mohri, Ananda Theertha Suresh

Logarithmic Regret from Sublinear Hints
Aditya Bhaskara, Ashok Cutkosky, Ravi Kumar, Manish Purohit

Margin-Independent Online Multiclass Learning via Convex Geometry
Guru Guruganesh, Allen Liu, Jon Schneider, Joshua Ruizhi Wang

Multiclass Boosting and the Cost of Weak Learning
Nataly Brukhim, Elad Hazan, Shay Moran, Indraneel Mukherjee, Robert E. Schapire

Neural-PIL: Neural Pre-integrated Lighting for Reflectance Decomposition
Mark Boss, Varun Jampani, Raphael Braun, Ce Liu^*, Jonathan T. Barron, Hendrik Lensch

Never Go Full Batch (in Stochastic Convex Optimization)
Idan Amir, Yair Carmon, Tomer Koren, Roi Livni

On Large-Cohort Training for Federated Learning
Zachary Charles, Zachary Garrett, Zhouyuan Huo, Sergei Shmulyian, Virginia Smith

On the Sample Complexity of Privately Learning Axis-Aligned Rectangles
Menachem Sadigurschi, Uri Stemmer

Online Control of Unknown Time-Varying Dynamical Systems
Edgar Minasyan, Paula Gradu, Max Simchowitz, Elad Hazan

Online Knapsack with Frequency Predictions
Sungjin Im, Ravi Kumar,Mahshid Montazer Qaem, Manish Purohit

Optimal Rates for Random Order Online Optimization
Uri Sherman, Tomer Koren, Yishay Mansour

Oracle-Efficient Regret Minimization in Factored MDPs with Unknown Structure
Aviv Rosenberg, Yishay Mansour

Practical Large-Scale Linear Programming Using Primal-Dual Hybrid Gradient
David Applegate, Mateo Díaz^*, Oliver Hinder, Haihao Lu^*, Miles Lubin, Brendan O’Donoghue, Warren Schudy

Private and Non-Private Uniformity Testing for Ranking Data
Robert Istvan Busa-Fekete, Dimitris Fotakis, Manolis Zampetakis

Privately Learning Subspaces
Vikrant Singhal, Thomas Steinke

Provable Representation Learning for Imitation with Contrastive Fourier Features
Ofir Nachum, Mengjiao Yang

Safe Reinforcement Learning with Natural Language Constraints
Tsung-Yen Yang, Michael Hu, Yinlam Chow, Peter J. Ramadge, Karthik Narasimhan

Searching for Efficient Transformers for Language Modeling
David R. So, Wojciech Mańke, Hanxiao Liu, Zihang Dai, Noam Shazeer, Quoc V. Le

SLOE: A Faster Method for Statistical Inference in High-Dimensional Logistic Regression
Steve Yadlowsky, Taedong Yun, Cory McLean, Alexander D’Amour

Streaming Linear System Identification with Reverse Experience Replay
Prateek Jain, Suhas S. Kowshik, Dheeraj Mysore Nagaraj, Praneeth Netrapalli

The Skellam Mechanism for Differentially Private Federated Learning
Naman Agarwal, Peter Kairouz, Ziyu Liu^*

TokenLearner: Adaptive Space-Time Tokenization for Videos
Michael S. Ryoo, AJ Piergiovanni, Anurag Arnab, Mostafa Dehghani, Anelia Angelova

Towards Best-of-All-Worlds Online Learning with Feedback Graphs
Liad Erez, Tomer Koren

Training Over-Parameterized Models with Non-decomposable Objectives
Harikrishna Narasimhan, Aditya Krishna Menon

Twice Regularized MDPs and the Equivalence Between Robustness and Regularization
Esther Derman, Matthieu Geist, Shie Mannor

Unsupervised Learning of Compositional Energy Concepts
Yilun Du, Shuang Li, Yash Sharma, Joshua B. Tenenbaum, Igor Mordatch

User-Level Differentially Private Learning via Correlated Sampling
Badih Ghazi, Ravi Kumar, Pasin Manurangsi

ViSER: Video-Specific Surface Embeddings for Articulated 3D Shape Reconstruction
Gengshan Yang, Deqing Sun, Varun Jampani, Daniel Vlasic, Forrester Cole, Ce Liu^*, Deva Ramanan

A Minimalist Approach to Offline Reinforcement Learning
Scott Fujimoto, Shixiang Gu

A Unified View of cGANs With and Without Classifiers
Si-An Chen, Chun-Liang Li, Hsuan-Tien Lin

CoAtNet: Marrying Convolution and Attention for All Data Sizes (see blog post)
Zihang Dai, Hanxiao Liu, Quoc V. Le, Mingxing Tan

Combiner: Full Attention Transformer with Sparse Computation Cost
Hongyu Ren^*, Hanjun Dai, Zihang Dai, Mengjiao Yang, Jure Leskovec, Dale Schuurmans, Bo Dai

Contrastively Disentangled Sequential Variational Autoencoder
Junwen Bai, Weiran Wang, Carla P. Gomes

Controlling Neural Networks with Rule Representations
Sungyong Seo, Sercan O. Arik, Jinsung Yoon, Xiang Zhang, Kihyuk Sohn, Tomas Pfister

Dataset Distillation with Infinitely Wide Convolutional Networks
Timothy Nguyen^*, Roman Novak, Lechao Xiao, Jaehoon Lee

Deep Synoptic Monte-Carlo Planning in Reconnaissance Blind Chess
Gregory Clark

Differentially Private Learning with Adaptive Clipping
Galen Andrew, Om Thakkar, Swaroop Ramaswamy, Hugh Brendan McMahan

Differentially Private Model Personalization
Prateek Jain, Keith Rush, Adam Smith, Shuang Song, Abhradeep Thakurta

Efficient Algorithms for Learning Depth-2 Neural Networks with General ReLU Activations
Pranjal Awasthi, Alex Tang, Aravindan Vijayaraghavan

Efficiently Identifying Task Groupings for Multi-Task Learning
Christopher Fifty, Ehsan Amid, Zhe Zhao, Tianhe Yu, Rohan Anil, Chelsea Finn

Generalized Shape Metrics on Neural Representations
Alex H. Williams, Erin Kunz, Simon Kornblith, Scott Linderman

High-Probability Bounds for Non-Convex Stochastic Optimization with Heavy Tails
Ashok Cutkosky, Harsh Mehta

Identity Testing for Mallows Model
Róbert Busa-Fekete, Dimitris Fotakis, Balázs Szörényi, Manolis Zampetakis

Learnable Fourier Features for Multi-dimensional Spatial Positional Encoding
Yang Li, Si Si, Gang Li, Cho-Jui Hsieh, Samy Bengio^*

Learning to Select Exogenous Events for Marked Temporal Point Process
Ping Zhang, Rishabh K. Iyer, Ashish V. Tendulkar, Gaurav Aggarwal, Abir De

Meta-learning to Improve Pre-training
Aniruddh Raghu, Jonathan Peter Lorraine, Simon Kornblith, Matthew B.A. McDermott, David Duvenaud

Pointwise Bounds for Distribution Estimation Under Communication Constraints
Wei-Ning Chen, Peter Kairouz, Ayfer Özgür

REMIPS: Physically Consistent 3D Reconstruction of Multiple Interacting People Under Weak Supervision
Mihai Fieraru, Mihai Zanfir, Teodor Alexandru Szente, Eduard Gabriel Bazavan, Vlad Olaru, Cristian Sminchisescu

Replacing Rewards with Examples: Example-Based Policy Search via Recursive Classification
Benjamin Eysenbach, Sergey Levine, Ruslan Salakhutdinov

Revealing and Protecting Labels in Distributed Training
Trung Dang, Om Thakkar, Swaroop Ramaswamy, Rajiv Mathews, Peter Chin, Françoise Beaufays

Robust Predictable Control
Benjamin Eysenbach, Ruslan Salakhutdinov, Sergey Levine

Robust Visual Reasoning via Language Guided Neural Module Networks
Arjun Reddy Akula, Varun Jampani, Soravit Changpinyo, Song-Chun Zhu

Towards Understanding Retrosynthesis by Energy-Based Models
Ruoxi Sun, Hanjun Dai, Li Li, Steven Kearnes, Bo Dai

Exploring the Limits of Out-of-Distribution Detection
Stanislav Fort, Jie Ren, Balaji Lakshminarayanan

Minimax Regret for Stochastic Shortest Path
Alon Cohen, Yonathan Efroni, Yishay Mansour, Aviv Rosenberg

No Regrets for Learning the Prior in Bandits
Soumya Basu, Branislav Kveton, Manzil Zaheer, Csaba Szepesvari

Structured Denoising Diffusion Models in Discrete State-Spaces
Jacob Austin, Daniel D. Johnsonv, Jonathan Ho, Daniel Tarlow, Rianne van den Berg

The Sensory Neuron as a Transformer: Permutation-Invariant Neural Networks for Reinforcement Learning (see blog post)
Yujin Tang, David Ha

On the Existence of The Adversarial Bayes Classifier
Pranjal Awasthi, Natalie Frank, Mehyrar Mohri

Beyond Value-Function Gaps: Improved Instance-Dependent Regret Bounds for Episodic Reinforcement Learning
Christopher Dann, Teodor Vanislavov Marinov, Mehryar Mohri, Julian Zimmert

A Provably Efficient Model-Free Posterior Sampling Method for Episodic Reinforcement Learning
Christopher Dann, Mehryar Mohri, Tong Zhang, Julian Zimmert

Datasets & Benchmarks Accepted Papers

Reduced, Reused and Recycled: The Life of a Dataset in Machine Learning Research
Bernard Koch, Emily Denton, Alex Hanna, Jacob G. Foster
Datasets & Benchmarks Best Paper

Constructing a Visual Dataset to Study the Effects of Spatial Apartheid in South Africa
Raesetje Sefala, Timnit Gebru, Luzango Mfupe, Nyalleng Moorosi

AI and the Everything in the Whole Wide World Benchmark
Inioluwa Deborah Raji, Emily M. Bender, Amandalynne Paullada, Emily Denton, Alex Hannah

A Unified Few-Shot Classification Benchmark to Compare Transfer and Meta Learning Approaches
Vincent Dumoulin, Neil Houlsby, Utku Evci, Xiaohua Zhai, Ross Goroshin, Sylvain Gelly, Hugo Larochelle

The Neural MMO Platform for Massively Multi-agent Research
Joseph Suarez, Yilun Du, Clare Zhu, Igor Mordatch, Phillip Isola

Systematic Evaluation of Causal Discovery in Visual Model-Based Reinforcement Learning
Nan Rosemary Ke, Aniket Didolkar, Sarthak Mittal, Anirudh Goyal, Guillaume Lajole, Stefan Bauer, Danilo Rezende, Yoshua Bengio, Michael Mozer, Christopher Pal

STEP: Segmenting and Tracking Every Pixel
Mark Weber, Jun Xie, Maxwell Collins, Yukun Zhu, Paul Voigtlaender, Hartwig Adam, Bradley Green, Andreas Geiger, Bastian Leibe, Daneil Cremers, Aljosa Osep, Laura Leal-Taixe, Liang-Chieh Chen

Artsheets for Art Datasets
Ramya Srinivisan, Emily Denton, Jordan Famularo, Negar Rostamzadeh, Fernando Diaz, Beth Coleman

SynthBio: A Case in Human–AI Collaborative Curation of Text Datasets
Ann Yuan, Daphne Ippolito, Vitaly Niolaev, Chris Callison-Burch, Andy Coenen, Sebastian Gehrmann

Benchmarking Bayesian Deep Learning on Diabetic Retinopathy Detection Tasks
Neil Band, Tim G. J. Rudner, Qixuan Feng, Angelos Filos, Zachary Nado, Michael W. Dusenberry, Ghassen Jerfel, Dustin Tran, Yarin Gal

Brax – A Differentiable Physics Engine for Large Scale Rigid Body Simulation (see blog post)
C. Daniel Freeman, Erik Frey, Anton Raichuk, Sertan Girgin, Igor Mordatch, Olivier Bachem

MLPerf Tiny Benchmark
Colby Banbury, Vijay Janapa Reddi, Peter Torelli, Jeremy Holleman, Nat Jeffries, Csaba Kiraly, Pietro Montino, David Kanter, Sebastian Ahmed, Danilo Pau, Urmish Thakker, Antonio Torrini, Peter Warden, Jay Cordaro, Giuseppe Di Guglielmo, Javier Duarte, Stephen Gibellini, Videet Parekh, Honson Tran, Nhan Tran, Niu Wenxu, Xu Xuesong

Automatic Construction of Evaluation Suites for Natural Language Generation Datasets
Simon Mille, Kaustubh D. Dhole, Saad Mahamood, Laura Perez-Beltrachini, Varun Gangal, Mihir Kale, Emiel van Miltenburg, Sebastian Gehrmann

An Empirical Investigation of Representation Learning for Imitation
Xin Chen, Sam Toyer, Cody Wild, Scott Emmons, Ian Fischer, Kuang-Huei Lee, Neel Alex, Steven Wang, Ping Luo, Stuart Russell, Pieter Abbeel, Rohin Shah

Multilingual Spoken Words Corpus
Mark Mazumder, Sharad Chitlangia, Colby Banbury, Yiping Kang, Juan Manuel Ciro, Keith Achorn, Daniel Galvez, Mark Sabini, Peter Mattson, David Kanter, Greg Diamos, Pete Warden, Josh Meyer, Vijay Janapa Reddi

Workshops

4th Robot Learning Workshop: Self-Supervised and Lifelong Learning
Sponsor: Google
Organizers include Alex Bewley, Vincent Vanhoucke

Differentiable Programming Workshop
Sponsor: Google

Machine Learning for Creativity and Design
Sponsor: Google
Organizers include: Daphne Ippolito, David Ha

LatinX in AI (LXAI) Research @ NeurIPS 2021
Sponsor: Google
Sponsorship Level: Platinum
Workshop Chairs include: Andres Munoz Medina
Mentorship Roundtables include: Jonathan Huang, Pablo Samuel Castro

Algorithmic Fairness Through the Lens of Causality and Robustness
Organizers include: Jessica Schrouff, Awa Dieng

ImageNet: Past, Present, and Future
Organizers include: Lucas Beyer, Xiaohua Zhai
Speakers include: Emily Denton, Vittorio Ferrari, Alex Hanna, Alex Kolesnikov, Rebecca Roelofs

Optimal Transport and Machine Learning
Organizers include: Marco Cuturi

Safe and Robust Control of Uncertain Systems
Speakers include: Aleksandra Faust

CtrlGen: Controllable Generative Modeling in Language and Vision
Speakers include: Sebastian Gehrmann

Deep Reinforcement Learning
Organizers include: Chelsea Finn
Speakers include: Karol Hausam, Dale Schuurmans

Distribution Shifts: Connecting Methods and Applications (DistShift)
Speakers include: Chelsea Finn

ML For Systems
Organizers include: Anna Goldie, Martin Maas, Azade Nazi, Azalia Mihoseini, Milad Hashemi, Kevin Swersky

Learning in Presence of Strategic Behavior
Organizers include: Yishay Mansour

Bayesian Deep Learning
Organizers include: Zoubin Ghahramani, Kevin Murphy

Advances in Programming Languages and Neurosymbolic Systems (AIPLANS)
Organizers include: Disha Shrivastava, Vaibhav Tulsyan, Danny Tarlow

Ecological Theory of Reinforcement Learning: How Does Task Design Influence Agent Learning?
Organizers include: Shixiang Shane Gu, Pablo Samuel Castro, Marc G. Bellemare

The Symbiosis of Deep Learning and Differential Equations
Organizers include: Lily Hu

Out-of-Distribution Generalization and Adaptation in Natural and Artificial Intelligence
Speakers include: Chelsea Finn

Cooperative AI
Organizers include: Natasha Jaques

Offline Reinforcement Learning
Organizers include: Rishabh Agarwal, George Tucker
Speakers include: Minmin Chen

2nd Workshop on Self-Supervised Learning: Theory and Practice
Organizers include: Kristina Toutanova

Data Centric AI
Organizers include: Lora Aroyo

Math AI for Education (MATHAI4ED): Bridging the Gap Between Research and Smart Education
Organizers include: Yuhai (Tony) Wu

Tutorials

Beyond Fairness in Machine Learning
Organizers include: Emily Denton

Competitions

Evaluating Approximate Inference in Bayesian Deep Learning
Organizers include: Matthew D. Hoffman, Sharad Vikram

HEAR 2021 NeurIPS Challenge Holistic Evaluation of Audio Representations
Organizers include: Jesse Engel

Machine Learning for Combinatorial Optimization
Organizers include: Pawel Lichocki, Miles Lubin

^*Work done while at Google. ^↩
^†Currently at Google. ^↩

Misc

Deep Learning Detects Earthquakes at Millimeter-Scale

Post author By
Post date December 6, 2021
No Comments on Deep Learning Detects Earthquakes at Millimeter-Scale

Image of a destroyed residential area after an earthquake in Japan. Researchers create a neural network that automatically detects tectonic fault deformation, crucial to understanding and possibly predicting earthquake behavior.

Researchers at Los Alamos National Laboratory in New Mexico are working toward earthquake detection with a new machine learning algorithm capable of global monitoring. The study uses Interferometric Synthetic Aperture Radar (InSAR) satellite data to detect slow-slip earthquakes. The work will help scientists gain a deeper understanding of the interplay between slow and fast earthquakes, which could be key to making future predictions of quake events.

“Applying machine learning to InSAR data gives us a new way to understand the physics behind tectonic faults and earthquakes,” Bertrand Rouet-Leduc, a geophysicist in Los Alamos’ Geophysics group said in a press release. “That’s crucial to understanding the full spectrum of earthquake behavior.”

Discovered a couple of decades ago, slow earthquakes remain a bit of a mystery. They occur at the boundary between plates and can last from days to months without detection due to their slow and quiet nature.

They typically happen in areas where faults are locked due to frictional resistance, and scientists believe they may precede major fast quakes. Japan’s 9.0 magnitude earthquake in 2011, which also caused a tsunami and the Fukushima nuclear disaster, followed two slow earthquakes along the Japan Trench.

Scientists can track earthquake behavior with InSAR satellite data. The radar waves have the benefit of penetrating clouds and also work effectively at night, making it possible to track ground deformation continuously. Comparing radar images over time, researchers can detect ground surface movement.

But these movements are small, and existing approaches limit ground deformation measurements to a few centimeters. Ongoing monitoring of global fault systems also creates massive data streams that are too much to interpret manually.

The researchers created deep learning models addressing both of these limitations. The team trained convolutional neural networks on several million time series of synthetic InSAR data to detect automatically and extract ground deformation.

Using cuDNN-accelerated TensorFlow deep learning framework distributed over multiple NVIDIA GPUs, the new methodology operates without prior knowledge of a fault’s location or slip behavior.

3 graphics showing the Anatolian fault, a raw time series and time series deformation detection. — *Figure 1. Application to real data shows the North Anatolian Fault 2013 slow earthquake.*

To test their approach, they applied the algorithm to a time series built from images of the North Anatolian fault in Turkey. As a major plate boundary fault, the area has ruptured several times in the past century.

With a finer temporal resolution, the algorithm identified previously undetected slippage events, showing that slow earthquakes happen much more often than expected. It also spotted movement as small as two millimeters, something experts would have overlooked due to the subtlety.

“The use of deep learning unlocks the detection on faults of deformation events an order of magnitude smaller than previously achieved manually. Observing many more slow slip events may, in turn, unveil their interaction with regular, dynamic earthquakes, including the potential nucleation of earthquakes with slow deformation,” Rouet-Leduc said.

The team is currently working on a follow-up study, testing a model on the San Andreas Fault that extends roughly 750 miles through California. According to Rouet-Leduc, the model will soon be available on GitHub.

Read the published research in Nature Communications . >>
Read the press release. >>

Misc

Navigating the Global Supply Chain with Networking Digital Twins

Post author By
Post date December 6, 2021
No Comments on Navigating the Global Supply Chain with Networking Digital Twins

Supply chain shortages are impacting many industries, with semiconductors feeling the crunch in particular. With networking digital twins, you don’t have to wait on the hardware. Get started with infrastructure simulation in NVIDIA Air to stage deployments, test out tools, and enable hardware-free training.

What do Ethernet switches, sports cars, household appliances, and toilet paper have in common? If you read this blog’s title and have lived through the past year and a half, you probably know the answer. These are all products whose availability has been impacted by the materials shortages due to the global pandemic.

In some instances, the supply issues are more of an inconvenience–waiting a few extra months to get that new Corvette won’t be the end of the world. For other products (think toilet paper or a replacement freezer), the supply crunch was and is a big deal.

It is easy to see the impact on consumers, but enterprises feel the pain of long lead times too. Consider Ethernet switches: Ethernet switches build the networking fabric that ties together the data center. Ethernet switch shortages mean more than “rack A is unable to talk to rack B.” They mean decreased aggregate throughput, and increased load on existing infrastructure, leading to more downtime and unplanned outages; that is, significant adverse impacts to business outcomes.

That all sounds bad, but there is no need to panic. NVIDIA can help you mitigate these challenges and transform your operations with a data center digital twin from NVIDIA Air.

So, what is a digital twin, and how is it related to the data center? A digital twin is a software-simulated replica of a real-world thing, system, or process. It constantly reacts and updates any changes to the status of its physical sibling and is always on. A data center digital twin applies the digital twin concept to data center infrastructure. To model the data center itself as a data center and not just a bunch of disparate pizza boxes, it is imperative that the data center digital twin fully simulates the network.

NVIDIA Air is unmatched in providing that capability. The modeling tool in Air enables you to create logical instances of every switch and cable, connecting to logical server instances. In addition to modeling the hardware, NVIDIA Air spins up fully functional virtual appliances with pre-built and fully functional network and server OS images. This is the key ingredient to the digital twin–with an appliance model, the simulation is application-granular.

Benefits

NVIDIA Air enables data center digital twins, but how does that solve supply chain issues? Focusing on those benefits tied to hardware, in particular, it enables:

Hardware-free POCs: Want exposure to the Cumulus Linux or SONiC NOSes? Ordinarily, you would have to acquire the gear to try out the functionality. With NVIDIA Air, you have access to Cumulus VX and SONiC VX–the virtual appliances mentioned above. Because Cumulus and SONiC are built from the ground up on standards-based technologies, you get the full experience without the hardware.
Staging production deployments: Already decided on NVIDIA Ethernet switches? There is no reason to sit on your hands until the pallet of switches arrives. With a digital twin, you can completely map out your data center fabric. You can test your deployment and provisioning scripts and know that they will work seamlessly after the systems have been racked, stacked, and cabled. This can reduce your bring-up time up to 95%.
Testing out new network and application tools: Need to roll out a new networking tool on your Spectrum Ethernet switches? Typically, you would need a prototype pre-production environment. With a digital twin, you deploy the application to the digital twin, validate the impact on your network with NetQ, tweak some settings if necessary, and make deployment to production worry-free.
Hardware-free training: Your organization has decided to bring on someone new to join your networking infrastructure team. They are eager to learn, but there is no hardware set aside for training purposes. Without a digital twin, you and the trainee would be stuck waiting on a new switch order or reading a long and tedious user manual. With the digital twin, you have an always-on sandbox, perfect for skill-building and exploration.

One caveat: data center digital twins will not expedite the date that the RTX 3090 comes back in stock at your favorite retailer, but they will help with the crunch around your networking procurement.

NVIDIA Air allows you to view a digital twin of your physical network — Digital Twins with NVIDIA Air

The best part – if you are curious to learn more, you can do so right now. NVIDIA Air brings the public cloud experience to on-premises networking, making it simple and quick to jump right in. Navigate to NVIDIA Air in your browser and get started immediately.