Categories
Misc

Building a Foundation for Zero Trust Security with NVIDIA DOCA 1.2

NVIDIA DOCA software framework visual represntationDive deep into the new features and use cases available for networking, security, storage in the latest release of the DOCA software framework. NVIDIA DOCA software framework visual represntation

Today, NVIDIA released the NVIDIA DOCA 1.2 software framework for NVIDIA BlueField DPUs, the world’s most advanced data processing unit (DPU). Designed to enable the NVIDIA BlueField ecosystem and developer community, DOCA is the key to unlocking the potential of the DPU by offering services to offload, accelerate, and isolate infrastructure applications services from the CPU. 

DOCA is a software framework that brings together APIs, drivers, libraries, sample code, documentation, services, and prepackaged containers to simplify and speed up application development and deployment on BlueField DPUs on every data center node. Together, DOCA and BlueField create an isolated and secure services domain for networking, security, storage, and infrastructure management that is ideal for enabling a zero-trust strategy.

The DOCA 1.2 release introduces several important features and use cases. 

Protect host services with adaptive cloud security

A modern approach to security based on zero trust principles is critical to securing today’s data centers, as resources inside the data center can no longer be trusted automatically.​ App Shield enables detection of attacks on critical services in a system. In many systems, those critical services are responsible for ensuring the integrity and privacy of the execution of many applications.

DOCA App Shield tech diagram showing the steps from host to AI Driven  intrusion detection.
Figure 1. Shield your host services with adaptive cloud security

DOCA App Shield provides host monitoring enabling cybersecurity vendors to create accelerated intrusion detection system (IDS) solutions to identify an attack on any physical or virtual machine. It can feed data about application status to security information and event management (SIEM) or extended detection and response (XDR) tools and also enhances forensic investigations.

If a host is compromised, attackers normally exploit the security control mechanism breaches to move laterally across data center networks to other servers and devices. App Shield enables security teams to shield their application processes, continuously validate their integrity, and in turn detect malicious activity. 

In the event that an attacker kills the machine security agent’s processes, App Shield can mitigate the attack by isolating the compromised host, preventing the malware from accessing confidential data or spreading to other resources. App Shield is an important advancement in the fight against cybercrime and an effective tool to enable a zero-trust security stance.

BlueField DPUs and the DOCA software framework provide an open foundation for partners and developers to build zero-trust solutions and address the security needs of the modern data center. Together, DOCA and BlueField create an isolated and secure services domain for networking, security, storage, and infrastructure management that is ideal for enabling a zero-trust strategy.

Create time-synchronized data centers

Precision timing is a critical capability to enable and accelerate distributed apps from edge to core. DOCA Firefly is a data center timing service that supports extremely precise time synchronization everywhere. With nanosecond-level clock synchronization, you can enable a new broad range of timing-critical and delay-sensitive applications. 

DOCA Firefly tech stack diagram  includes services, tools, LIBs and drivers which support a wide range of use cases.
Figure 2. Precision time-synchronized data center service

DOCA Firefly addresses a wide range of use cases, including the following:

  • High-frequency trading
  • Distributed databases
  • Industrial 5G radio access networks (RAN)
  • Scientific research
  • High performance computing (HPC)
  • Omniverse digital twins
  • Gaming
  • AR/VR
  • Autonomous vehicles
  • Security

It enables data consistency, accurate event ordering, and causality analysis, such as ensuring the correct sequencing of stock market transactions and fair bidding during digital auctions. The hardware engines in the BlueField application-specific integrated circuit (ASIC) are capable of time-stamping data packets at full wire speed with breakthrough nanosecond-level accuracy. 

Improving the accuracy of data center timing by orders of magnitude offers many advantages. 

With globally synchronized data centers, you can accelerate distributed applications and data analysis including AI, HPC, professional media production, telco virtual network functions, and precise event monitoring. All the servers in the data center—or across data centers—can be harmonized to provide something that is far bigger than any single compute node.

The benefits of improving data center timing accuracy include a reduction in the amount of compute power and network traffic needed to replicate and validate the data. For example, Firefly synchronization delivers a 3x database performance gain to distributed databases.

DOCA HBN beta

The BlueField DPU is a unique solution for network acceleration and policy enforcement within an endpoint host. At the same time, BlueField provides an administrative and software demarcation between the host operating system and functions running on the DPU. 

With DOCA host-based networking (HBN), top-of-rack (TOR) network configuration can extend down to the DPU, enabling network administrators to own DPU configuration and management while application management can be handled separately by x86 host administrators. This creates an unparalleled opportunity to reimagine how you can build data center networks.

DOCA 1.2 provides a new driver for HBN called Netlink to DOCA (nl2doca) that accelerates and offloads traditional Linux Netlink messages. nl2doca is provided as an acceleration driver integrated as part of the HBN service container. You can now accelerate host networking for L2 and L3 that relies on DPDK, OVS, or now kernel routing with Netlink. 

NVIDIA is adding support for the open-source Free Range Routing (FRR) project, running on the DPU and leveraging this new nl2doca driver. This support enables the DPU to operate exactly like a TOR switch plus additional benefits. FRR on the DPU enables EVPN networks to move directly into the host, providing layer 2 (VLAN) extension and layer 3 (VRF) tenant isolation.

HBN on the DPU can manage and monitor traffic between VMs or containers on the same node. It can also analyze and encrypt or decrypt then analyze traffic to and from the node, both tasks that no ToR switch can perform. You can build your own Amazon VPC-like solution in your private cloud for containerized, virtual machine, and bare metal workloads.

HBN with BlueField DPUs revolutionizes how you build data center networks. It offers the following benefits:

  • Plug-and-play servers: Leveraging FRR’s BGP unnumbered, servers can be directly connected to the network with no need to coordinate server-to-switch configurations. No need for MLAG, bonding, or NIC teaming.
  • Open, interoperable multi-tenancy: EVPN enables server-to-server or server-to-switch overlays. This provides multi-tenant solutions for bare metal, closed appliances, or any hypervisor solution, regardless of the underlay networking vendor. EVPN provides distributed overlay configuration, while eliminating the need for costly, proprietary, centralized SDN controllers.
  • Secure network management: The BlueField DPU provides an isolated environment for network policy configuration and enforcement. There are no software or dependencies on the host. 
  • Enabling advanced HCI and storage networking: BlueField provides a simple method for HCI and storage partners to solve current network challenges for multi-tenant and hybrid cloud solutions, regardless of the hypervisor.
  • Flexible network offloading: The nl2doca driver provided by HBN enables any netlink capable application to offload and accelerate kernel based networking without the complexities of traditional DPDK libraries. 
  • Simplification of TOR switch requirements: More intelligence is placed on the DPU within the server, reducing the complexity of the TOR switch.

Additional DOCA 1.2 SDK updates:

  • DOCA FLOW – Firewall (Alpha)
  • DOCA FLOW – Gateway (Beta)
  • DOCA FLOW remote APIs
  • DOCA 1.2 includes enhancements and scale for IPsec and TLS

DLI course: Introduction to DOCA for the BlueField DPU

In addition, NVIDIA is introducing a Deep Learning Institute (DLI) course: Introduction to DOCA for the BlueField DPU. The main objective of this course is to provide students, including developers, researchers, and system administrators, with an introduction to DOCA and BlueField DPUs. This enables students to successfully work with DOCA to create accelerated applications and services powered by BlueField DPUs.

Try DOCA today

You can experience DOCA today with the DOCA software, which includes DOCA SDK and runtime accelerated libraries for networking, storage, and security. The libraries help you program your data center infrastructure running on the DPU.

The DOCA Early Access program is open now for applications. To receive news and updates about DOCA or to become an early access member/partner, register on the DOCA Early Access page.

For more information, see the following resources:

Categories
Offsites

Google at NeurIPS 2021

This week marks the beginning of the 35th annual Conference on Neural Information Processing Systems (NeurIPS 2021), the biggest machine learning conference of the year. NeurIPS 2021 will be held virtually and includes invited talks, demonstrations and presentations of some of the latest in machine learning research. This year, NeurIPS also announced a new Datasets and Benchmarks track, which will include publications, talks, posters, and discussions related to this research area.

Google will have a strong presence with more than 170 accepted papers, additionally contributing to and learning from the broader academic research community via talks, posters, workshops, and tutorials. You can learn more about our work being presented in the list below (Google affiliations highlighted in bold).

Organizing Committee

Communications Co-Chair: Emily Denton
Program Co-Chair: Yann Dauphin
Workshop Co-Chair: Sanmi Koyejo

Senior Area Chairs: Alekh Agarwal, Amir Globerson, Been Kim, Charles Sutton, Claudio Gentile, Corinna Cortes, Dale Schuurmans, David Duvenaud, Elad Hazan, Hugo Larochelle, Jean-Philippe Vert, Kevin Murphy, Marco Cuturi, Mehryar Mohri, Mohammad Ghavamzadeh, Samory Kpotufe, Sanjiv Kumar, Satyen Kale, Sergey Levine, Tara N. Sainath, Yishay Mansour

Area Chairs: Abhishek Kumar, Abhradeep Guha Thakurta, Alex Kulesza, Alexander A. Alemi, Alexander T. Toshev, Amin Karbasi, Amit Daniely, Ananda Theertha Suresh, Ankit Singh Rawat, Ashok Cutkosky, Badih Ghazi, Balaji Lakshminarayanan, Ben Poole, Bo Dai, Boqing Gong, Chelsea Finn, Chiyuan Zhang, Christian Szegedy, Cordelia Schmid, Craig Boutilier, Cyrus Rashtchian, D. Sculley, Daniel Keysers, David Ha, Denny Zhou, Dilip Krishnan, Dumitru Erhan, Dustin Tran, Ekin Dogus Cubuk, Fabian Pedregosa, George Tucker, Hanie Sedghi, Hanjun Dai, Heinrich Jiang, Hossein Mobahi, Izhak Shafran, Jaehoon Lee, Jascha Sohl-Dickstein, Jasper Snoek, Jeffrey Pennington, Jelani Nelson, Jieming Mao, Justin Gilmer, Karol Hausman, Karthik Sridharan, Kevin Swersky, Maithra Raghu, Mario Lucic, Mathieu Blondel, Matt Kusner, Matthew Johnson, Matthieu Geist, Ming-Hsuan Yang, Mohammad Mahdian, Mohammad Norouzi, Nal Kalchbrenner, Naman Agarwal, Nicholas Carlini, Nicolas Papernot, Olivier Bachem, Olivier Pietquin, Paul Duetting, Praneeth Netrapalli, Pranjal Awasthi, Prateek Jain, Quentin Berthet, Renato Paes Leme, Richard Nock, Rif A. Saurous, Rose Yu, Roy Frostig, Samuel Stern Schoenholz, Sashank J. Reddi, Sercan O. Arik, Sergei Vassilvitskii, Sergey Ioffe, Shay Moran, Silvio Lattanzi, Simon Kornblith, Srinadh Bhojanapalli, Thang Luong, Thomas Steinke, Tim Salimans, Tomas Pfister, Tomer Koren, Uri Stemmer, Vahab Mirrokni, Vikas Sindhwani, Vincent Dumoulin, Virginia Smith, Vladimir Braverman, W. Ronny Huang, Wen Sun, Yang Li, Yasin Abbasi-Yadkori, Yinlam Chow,Yujia Li, Yunhe Wang, Zoltán Szabó

NeurIPS Foundation Board 2021: Michael Mozer, Corinna Cortes, Hugo Larochelle, John C. Platt, Fernando Pereira

Test of Time Award

Online Learning for Latent Dirichlet Allocation
Matthew D. Hoffman, David M. Blei, Francis Bach

Publications

Deep Reinforcement Learning at the Edge of the Statistical Precipice (see blog post)
Outstanding Paper Award Recipient
Rishabh Agarwal, Max Schwarzer, Pablo Samuel Castro, Aaron Courville, Marc G. Bellemare

A Separation Result Between Data-Oblivious and Data-Aware Poisoning Attacks
Samuel Deng, Sanjam Garg, Somesh Jha, Saeed Mahloujifar, Mohammad Mahmoody, Abhradeep Guha Thakurta

Adversarial Robustness of Streaming Algorithms Through Importance Sampling
Vladimir Braverman, Avinatan Hassidim, Yossi Matias, Mariano Schain, Sandeep Silwal, Samson Zhou

Aligning Silhouette Topology for Self-Adaptive 3D Human Pose Recovery
Mugallodi Rakesh, Jogendra Nath Kundu, Varun Jampani, R. Venkatesh Babu

Attention Bottlenecks for Multimodal Fusion
Arsha Nagrani, Shan Yang, Anurag Arnab, Aren Jansen, Cordelia Schmid, Chen Sun

Autonomous Reinforcement Learning via Subgoal Curricula
Archit Sharma, Abhishek Gupta, Sergey Levine, Karol Hausman, Chelsea Finn

Calibration and Consistency of Adversarial Surrogate Losses
Pranjal Awasthi, Natalie S. Frank, Anqi Mao, Mehryar Mohri, Yutao Zhong

Compressive Visual Representations
Kuang-Huei Lee, Anurag Arnab, Sergio Guadarrama, John Canny, Ian Fischer

Counterfactual Invariance to Spurious Correlations in Text Classification
Victor Veitch, Alexander D’Amour, Steve Yadlowsky, Jacob Eisenstein

Deep Learning Through the Lens of Example Difficulty
Robert J.N. Baldock, Hartmut Maennel, Behnam Neyshabur

Deep Neural Networks as Point Estimates for Deep Gaussian Processes
Vinent Dutordoir, James Hensman, Mark van der Wilk, Carl Henrik Ek, Zoubin Ghahramani, Nicolas Durrande

Delayed Gradient Averaging: Tolerate the Communication Latency for Federated Learning
Ligeng Zhu, Hongzhou Lin, Yao Lu, Yujun Lin, Song Han

Discrete-Valued Neural Communication
Dianbo Liu, Alex Lamb, Kenji Kawaguchi, Anirudh Goyal, Chen Sun, Michael Curtis Mozer, Yoshua Bengio

Do Vision Transformers See Like Convolutional Neural Networks?
Maithra Raghu, Thomas Unterthiner, Simon Kornblith, Chiyuan Zhang, Alexey Dosovitskiy

Dueling Bandits with Team Comparisons
Lee Cohen, Ulrike Schmidt-Kraepelin, Yishay Mansour

End-to-End Multi-Modal Video Temporal Grounding
Yi-Wen Chen, Yi-Hsuan Tsai, Ming-Hsuan Yang

Environment Generation for Zero-Shot Compositional Reinforcement Learning
Izzeddin Gur, Natasha Jaques, Yingjie Miao, Jongwook Choi, Manoj Tiwari, Honglak Lee, Aleksandra Faust

H-NeRF: Neural Radiance Fields for Rendering and Temporal Reconstruction of Humans in Motion
Hongyi Xu, Thiemo Alldieck, Cristian Sminchisescu

Improving Calibration Through the Relationship with Adversarial Robustness
Yao Qin, Xuezhl Wang, Alex Beutel, Ed Chi

Learning Generalized Gumbel-Max Causal Mechanisms
Guy Lorberbom, Daniel D. Johnson, Chris J. Maddison, Daniel Tarlow, Tamir Hazan

MICo: Improved Representations via Sampling-Based State Similarity for Markov Decision Processes
Pablo Samuel Castro, Tyler Kastner, Prakash Panangaden, Mark Rowland

Near-Optimal Lower Bounds For Convex Optimization For All Orders of Smoothness
Ankit Garg, Robin Kothari, Praneeth Netrapalli, Suhail Sherif

Neural Circuit Synthesis from Specification Patterns
Frederik Schmitt, Christopher Hahn, Markus N. Rabe, Bernd Finkbeiner

Non-Local Latent Relation Distillation for Self-Adaptive 3D Human Pose Estimation
Jogendra Nath Kundu, Siddharth Seth, Anirudh Jamkhandi, Pradyumna YM, Varun Jampani, Anirban Chakraborty, R. Venkatesh Babu

Object-Aware Contrastive Learning for Debiased Scene Representation
Sangwoo Mo, Hyunwoo Kang, Kihyuk Soh, Chun-Liang Li, Jinwoo Shin

On Density Estimation with Diffusion Models
Diederik P. Kingma, Tim Salimans, Ben Poole, Jonathan Ho

On Margin-Based Cluster Recovery with Oracle Queries
Marco Bressan, Nicolo Cesa-Bianchi, Silvio Lattanzi, Andrea Paudice

On Model Calibration for Long-Tailed Object Detection and Instance Segmentation
Tai-Yu Pan, Cheng Zhang, Yandong Li, Hexiang Hu, Dong Xuan, Soravit Changpinyo, Boqing Gong, Wei-Lun Chao

Parallelizing Thompson Sampling
Amin Karbasi, Vahab Mirrokni, Mohammad Shadravan

Reverse-Complement Equivariant Networks for DNA Sequences
Vincent Mallet, Jean-Philippe Vert

Revisiting ResNets: Improved Training and Scaling Strategies
Irwan Bello, William Fedus, Xianzhi Du, Ekin Dogus Cubuk, Aravind Srinivas, Tsung-Yi Lin, Jonathon Shlens, Barret Zoph

Revisiting the Calibration of Modern Neural Networks
Matthias Minderer, Josip Djolonga, Rob Romijnders, Frances Ann Hubis, Xiaohua Zhai, Neil Houlsby, Dustin Tran, Mario Lucic

Scaling Vision with Sparse Mixture of Experts
Carlos Riquelme, Joan Puigcerver, Basil Mustafa, Maxim Neumann, Rodolphe Jenatton, André Susano Pinto, Daniel Keysers, Neil Houlsby

SE(3)-Equivariant Prediction of Molecular Wavefunctions and Electronic Densities
Oliver Thorsten Unke, Mihail Bogojeski, Michael Gastegger, Mario Geiger, Tess Smidt, Klaus Robert Muller

Stateful ODE-Nets Using Basis Function Expansions
Alejandro Francisco Queiruga, N. Benjamin Erichson, Liam Hodgkinson, Michael W. Mahoney

Statistically and Computationally Efficient Linear Meta-Representation Learning
Kiran Koshy Thekumparampil, Prateek Jain, Praneeth Netrapalli, Sewoong Oh

Streaming Belief Propagation for Community Detection
Yuchen Wu, Jakab Tardos, Mohammad Hossein Bateni, André Linhares, Filipe Miguel Gonçalves de Almeida, Andrea Montanari, Ashkan Norouzi-Fard

Synthetic Design: An Optimization Approach to Experimental Design with Synthetic Controls
Nick Doudchenko, Khashayar Khosravi, Jean Pouget-Abadie, Sebastien Lahaie, Miles Lubin, Vahab Mirrokni, Jann Spiess, Guido Imbens

The Difficulty of Passive Learning in Deep Reinforcement Learning
George Ostrovski, Pablo Samuel Castro, Will Dabney

The Pareto Frontier of Model Selection for General Contextual Bandits
Teodor Marinov, Julian Zimmert

VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text
Hassan Akbari, Liangzhe Yuan, Rui Qian, Wei-Hong Chuang, Shih-Fu Chang, Yin Cui, Boqing Gong

Co-Adaptation of Algorithmic and Implementational Innovations in Inference-Based Deep Reinforcement Learning
Hiroki Furuta, Tadashi Kozuno, Tatsuya Matsushima, Yutaka Matsuo, Shixiang Gu

Conservative Data Sharing for Multi-Task Offline Reinforcement Learning
Tianhe Yu, Aviral Kumar, Yevgen Chebotar, Karol Hausman, Sergey Levine, Chelsea Finn

Does Knowledge Distillation Really Work?
Samuel Stanton, Pavel Izmailov, Polina Kirichenko, Alexander A. Alemi, Andrew Gordon Wilson

Exponential Graph is Provably Efficient for Decentralized Deep Training
Bicheng Ying, Kun Yuan, Yiming Chen, Hanbin Hu, Pan Pan, Wotao Yin

Faster Matchings via Learned Duals
Michael Dinitz, Sungjin Im, Thomas Lavastida, Benjamin Moseley, Sergei Vassilvitskii

Improved Transformer for High-Resolution GANs
Long Zhao, Zizhao Zhang, Ting Chen, Dimitris N. Metaxas, Han Zhang

Near-Optimal Offline and Streaming Algorithms for Learning Non-Linear Dynamical Systems
Prateek Jain, Suhas S. Kowshik, Dheeraj Mysore Nagaraj, Praneeth Netrapalli

Nearly Horizon-Free Offline Reinforcement Learning
Tongzheng Ren, Jialian Li, Bo Dai, Simon S. Du, Sujay Sanghavi

Overparameterization Improves Robustness to Covariate Shift in High Dimensions
Nilesh Tripuraneni, Ben Adlam, Jeffrey Pennington

Pay Attention to MLPs
Hanxiao Liu, Zihang Dai, David R. So, Quoc V. Le

PLUR: A Unifying, Graph-Based View of Program Learning, Understanding, and Repair
Zimin Chen*, Vincent Josua Hellendoorn*, Pascal Lamblin, Petros Maniatis, Pierre-Antoine Manzagol, Daniel Tarlow, Subhodeep Moitra

Prior-Independent Dynamic Auctions for a Value-Maximizing Buyer
Yuan Deng, Hanrui Zhang

Remember What You Want to Forget: Algorithms for Machine Unlearning
Ayush Sekhari, Jayadev Acharya, Gautam Kamath, Ananda Theertha Suresh

Reverse Engineering Learned Optimizers Reveals Known and Novel Mechanisms
Niru Maheswaranathan*, David Sussillo*, Luke Metz, Ruoxi Sun, Jascha Sohl-Dickstein

Revisiting 3D Object Detection From an Egocentric Perspective
Boyang Deng, Charles R. Qi, Mahyar Najibi, Thomas Funkhouser, Yin Zhou, Dragomir Anguelov

Robust Auction Design in the Auto-Bidding World
Santiago Balseiro, Yuan Deng, Jieming Mao, Vahab Mirrokni, Song Zuo

Shift-Robust GNNs: Overcoming the Limitations of Localized Graph Training Data
Qi Zhu, Natalia Ponomareva, Jiawei Han, Bryan Perozzi

Understanding How Encoder-Decoder Architectures Attend
Kyle Aitken, Vinay V. Ramasesh, Yuan Cao, Niru Maheswaranathan

Understanding the Effect of Stochasticity in Policy Optimization
Jincheng Mei, Bo Dai, Chenjun Xiao, Csaba Szepesvari, Dale Schuurmans

Accurately Solving Rod Dynamics with Graph Learning
Han Shao, Tassilo Kugelstadt, Torsten Hädrich, Wojtek Palubicki, Jan Bender, Sören Pirk, Dominik L. Michels

GradInit: Learning to Initialize Neural Networks for Stable and Efficient Training
Chen Zhu, Renkun Ni, Zheng Xu, Kezhi Kong, W. Ronny Huang, Tom Goldstein

Learnability of Linear Thresholds from Label Proportions
Rishi Saket

MLP-Mixer: An All-MLP Architecture for Vision
Ilya Tolstikhin, Neil Houlsby, Alexander Kolesnikov, Lucas Beyer, Xiaohua Zhai, Thomas Unterthiner, Jessica Yung, Andreas Steiner, Daniel Keysers, Jakob Uszkoreit, Mario Lucic, Alexey Dosovitskiy

Neural Additive Models: Interpretable Machine Learning with Neural Nets
Rishabh Agarwal, Levi Melnick, Nicholas Frosst, Xuezhou Zhang, Ben Lengerich, Rich Caruana, Geoffrey Hinton

Neural Production Systems
Anirudh Goyal, Aniket Didolkar, Nan Rosemary Ke, Charles Blundell, Philippe Beaudoin, Nicolas Heess, Michael Mozer, Yoshua Bengio

Physics-Aware Downsampling with Deep Learning for Scalable Flood Modeling
Niv Giladi, Zvika Ben-Haim, Sella Nevo, Yossi Matias, Daniel Soudry

Shape from Blur: Recovering Textured 3D Shape and Motion of Fast Moving Objects
Denys Rozumnyi, Martin R. Oswald, Vittorio Ferrari, Marc Pollefeys

What Matters for Adversarial Imitation Learning?
Manu Orsini, Anton Raichuk, Léonard Hussenot, Damien Vincent, Robert Dadashi, Sertan Girgin, Matthieu Geist, Olivier Bachem, Olivier Pietquin, Marcin Andrychowicz

A Convergence Analysis of Gradient Descent on Graph Neural Networks
Pranjal Awasthi, Abhimanyu Das, Sreenivas Gollapudi

A Geometric Analysis of Neural Collapse with Unconstrained Features
Zhihui Zhu, Tianyu Ding, Jinxin Zhou, Xiao Li, Chong You, Jeremias Sulam, Qing Qu

Agnostic Reinforcement Learning with Low-Rank MDPs and Rich Observations
Christoph Dann, Yishay Mansour, Mehryar Mohri, Ayush Sekhari, Karthik Sridharan

Controlled Text Generation as Continuous Optimization with Multiple Constraints
Sachin Kumar, Eric Malmi, Aliaksei Severyn, Yulia Tsvetkov

Coupled Gradient Estimators for Discrete Latent Variables
Zhe Dong, Andriy Mnih, George Tucker

Detecting Errors and Estimating Accuracy on Unlabeled Data with Self-Training Ensembles
Jiefeng Chen*, Frederick Liu, Besim Avci, Xi Wu, Yingyu Liang, Somesh Jha

Neural Active Learning with Performance Guarantees
Zhilei Wang, Pranjal Awasthi, Christoph Dann, Ayush Sekhari, Claudio Gentile

Optimal Sketching for Trace Estimation
Shuli Jiang, Hai Pham, David Woodruff, Qiuyi (Richard) Zhang

Representing Long-Range Context for Graph Neural Networks with Global Attention
Zhanghao Wu, Paras Jain, Matthew A. Wright, Azalia Mirhoseini, Joseph E. Gonzalez, Ion Stoica

Scaling Up Exact Neural Network Compression by ReLU Stability
Thiago Serra, Xin Yu, Abhinav Kumar, Srikumar Ramalingam

Soft Calibration Objectives for Neural Networks
Archit Karandikar, Nicholas Cain, Dustin Tran, Balaji Lakshminarayanan, Jonathon Shlens, Michael Curtis Mozer, Rebecca Roelofs

Sub-Linear Memory: How to Make Performers SLiM
Valerii Likhosherstov, Krzysztof Choromanski, Jared Davis, Xingyou Song, Adrian Weller

A New Theoretical Framework for Fast and Accurate Online Decision-Making
Nicolò Cesa-Bianchi, Tommaso Cesari, Yishay Mansour, Vianney Perchet

Bridging the Gap Between Practice and PAC-Bayes Theory in Few-Shot Meta-Learning
Nan Ding, Xi Chen, Tomer Levinboim, Sebastian Goodman, Radu Soricut

Differentially Private Multi-Armed Bandits in the Shuffle Model
Jay Tenenbaum, Haim Kaplan, Yishay Mansour, Uri Stemmer

Efficient and Local Parallel Random Walks
Michael Kapralov, Silvio Lattanzi, Navid Nouri, Jakab Tardos

Improving Anytime Prediction with Parallel Cascaded Networks and a Temporal-Difference Loss
Michael Louis Iuzzolino, Michael Curtis Mozer, Samy Bengio*

It Has Potential: Gradient-Driven Denoisers for Convergent Solutions to Inverse Problems
Regev Cohen, Yochai Blau, Daniel Freedman, Ehud Rivlin

Learning to Combine Per-Example Solutions for Neural Program Synthesis
Disha Shrivastava, Hugo Larochelle, Daniel Tarlow

LLC: Accurate, Multi-purpose Learnt Low-Dimensional Binary Codes
Aditya Kusupati, Matthew Wallingford, Vivek Ramanujan, Raghav Somani, Jae Sung Park, Krishna Pillutla, Prateek Jain, Sham Kakade, Ali Farhadi

There Is No Turning Back: A Self-Supervised Approach for Reversibility-Aware Reinforcement Learning (see blog post)

Nathan Grinsztajn, Johan Ferret, Olivier Pietquin, Philippe Preux, Matthieu Geist

A Near-Optimal Algorithm for Debiasing Trained Machine Learning Models
Ibrahim Alabdulmohsin, Mario Lucic

Adaptive Sampling for Minimax Fair Classification
Shubhanshu Shekhar, Greg Fields, Mohammad Ghavamzadeh, Tara Javidi

Asynchronous Stochastic Optimization Robust to Arbitrary Delays
Alon Cohen, Amit Daniely, Yoel Drori, Tomer Koren, Mariano Schain

Boosting with Multiple Sources
Corinna Cortes, Mehryar Mohri, Dmitry Storcheus, Ananda Theertha Suresh

Breaking the Centralized Barrier for Cross-Device Federated Learning
Sai Praneeth Karimireddy, Martin Jaggi, Satyen Kale, Mehryar Mohri, Sashank J. Reddi, Sebastian U. Stitch, Ananda Theertha Sureshi

Canonical Capsules: Self-Supervised Capsules in Canonical Pose
Weiwei Sun, Andrea Tagliasacchi, Boyang Deng, Sara Sabour, Soroosh Yazdani, Geoffrey Hinton, Kwang Moo Yi

Contextual Recommendations and Low-Regret Cutting-Plane Algorithms
Sreenivas Gollapudi, Guru Guruganesh, Kostas Kollias, Pasi Manurangsi, Renato Paes Leme, Jon Schneider

Decision Transformer: Reinforcement Learning via Sequence Modeling
Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee|Aditya Grover, Michael Laskin, Pieter Abbeel, Aravind Srinivas, Igor Mordatch

Deep Learning on a Data Diet: Finding Important Examples Early in Training
Mansheej Paul, Surya Ganguli, Gintare Karolina Dziugaite

Deep Learning with Label Differential Privacy
Badih Ghazi, Noah Golowich*, Ravi Kumar, Pasin Manurangsi, Chiyuan Zhang

Efficient Training of Retrieval Models Using Negative Cache
Erik Lindgren, Sashank J. Reddi, Ruiqi Guo, Sanjiv Kumar

Exploring Cross-Video and Cross-Modality Signals for Weakly-Supervised Audio-Visual Video Parsing
Yan-Bo Lin, Hung-Yu Tseng, Hsin-Ying Lee, Yen-Yu Lin, Ming-Hsuan Yang

Federated Reconstruction: Partially Local Federated Learning
Karan Singhal, Hakim Sidahmed, Zachary Garrett, Shanshan Wu, Keith Rush, Sushant Prakash

Framing RNN as a Kernel Method: A Neural ODE Approach
Adeline Fermanian, Pierre Marion, Jean-Philippe Vert, Gérard Biau

Learning Semantic Representations to Verify Hardware Designs
Shobha Vasudevan, Wenjie Jiang, David Bieber, Rishabh Singh, Hamid Shojaei, C. Richard Ho, Charles Sutton

Learning with User-Level Privacy
Daniel Asher Nathan Levy*, Ziteng Sun*, Kareem Amin, Satyen Kale, Alex Kulesza, Mehryar Mohri, Ananda Theertha Suresh

Logarithmic Regret from Sublinear Hints
Aditya Bhaskara, Ashok Cutkosky, Ravi Kumar, Manish Purohit

Margin-Independent Online Multiclass Learning via Convex Geometry
Guru Guruganesh, Allen Liu, Jon Schneider, Joshua Ruizhi Wang

Multiclass Boosting and the Cost of Weak Learning
Nataly Brukhim, Elad Hazan, Shay Moran, Indraneel Mukherjee, Robert E. Schapire

Neural-PIL: Neural Pre-integrated Lighting for Reflectance Decomposition
Mark Boss, Varun Jampani, Raphael Braun, Ce Liu*, Jonathan T. Barron, Hendrik Lensch

Never Go Full Batch (in Stochastic Convex Optimization)
Idan Amir, Yair Carmon, Tomer Koren, Roi Livni

On Large-Cohort Training for Federated Learning
Zachary Charles, Zachary Garrett, Zhouyuan Huo, Sergei Shmulyian, Virginia Smith

On the Sample Complexity of Privately Learning Axis-Aligned Rectangles
Menachem Sadigurschi, Uri Stemmer

Online Control of Unknown Time-Varying Dynamical Systems
Edgar Minasyan, Paula Gradu, Max Simchowitz, Elad Hazan

Online Knapsack with Frequency Predictions
Sungjin Im, Ravi Kumar,Mahshid Montazer Qaem, Manish Purohit

Optimal Rates for Random Order Online Optimization
Uri Sherman, Tomer Koren, Yishay Mansour

Oracle-Efficient Regret Minimization in Factored MDPs with Unknown Structure
Aviv Rosenberg, Yishay Mansour

Practical Large-Scale Linear Programming Using Primal-Dual Hybrid Gradient
David Applegate, Mateo Díaz*, Oliver Hinder, Haihao Lu*, Miles Lubin, Brendan O’Donoghue, Warren Schudy

Private and Non-Private Uniformity Testing for Ranking Data
Robert Istvan Busa-Fekete, Dimitris Fotakis, Manolis Zampetakis

Privately Learning Subspaces
Vikrant Singhal, Thomas Steinke

Provable Representation Learning for Imitation with Contrastive Fourier Features
Ofir Nachum, Mengjiao Yang

Safe Reinforcement Learning with Natural Language Constraints
Tsung-Yen Yang, Michael Hu, Yinlam Chow, Peter J. Ramadge, Karthik Narasimhan

Searching for Efficient Transformers for Language Modeling
David R. So, Wojciech Mańke, Hanxiao Liu, Zihang Dai, Noam Shazeer, Quoc V. Le

SLOE: A Faster Method for Statistical Inference in High-Dimensional Logistic Regression
Steve Yadlowsky, Taedong Yun, Cory McLean, Alexander D’Amour

Streaming Linear System Identification with Reverse Experience Replay
Prateek Jain, Suhas S. Kowshik, Dheeraj Mysore Nagaraj, Praneeth Netrapalli

The Skellam Mechanism for Differentially Private Federated Learning
Naman Agarwal, Peter Kairouz, Ziyu Liu*

TokenLearner: Adaptive Space-Time Tokenization for Videos
Michael S. Ryoo, AJ Piergiovanni, Anurag Arnab, Mostafa Dehghani, Anelia Angelova

Towards Best-of-All-Worlds Online Learning with Feedback Graphs
Liad Erez, Tomer Koren

Training Over-Parameterized Models with Non-decomposable Objectives
Harikrishna Narasimhan, Aditya Krishna Menon

Twice Regularized MDPs and the Equivalence Between Robustness and Regularization
Esther Derman, Matthieu Geist, Shie Mannor

Unsupervised Learning of Compositional Energy Concepts
Yilun Du, Shuang Li, Yash Sharma, Joshua B. Tenenbaum, Igor Mordatch

User-Level Differentially Private Learning via Correlated Sampling
Badih Ghazi, Ravi Kumar, Pasin Manurangsi

ViSER: Video-Specific Surface Embeddings for Articulated 3D Shape Reconstruction
Gengshan Yang, Deqing Sun, Varun Jampani, Daniel Vlasic, Forrester Cole, Ce Liu*, Deva Ramanan

A Minimalist Approach to Offline Reinforcement Learning
Scott Fujimoto, Shixiang Gu

A Unified View of cGANs With and Without Classifiers
Si-An Chen, Chun-Liang Li, Hsuan-Tien Lin

CoAtNet: Marrying Convolution and Attention for All Data Sizes (see blog post)
Zihang Dai, Hanxiao Liu, Quoc V. Le, Mingxing Tan

Combiner: Full Attention Transformer with Sparse Computation Cost
Hongyu Ren*, Hanjun Dai, Zihang Dai, Mengjiao Yang, Jure Leskovec, Dale Schuurmans, Bo Dai

Contrastively Disentangled Sequential Variational Autoencoder
Junwen Bai, Weiran Wang, Carla P. Gomes

Controlling Neural Networks with Rule Representations
Sungyong Seo, Sercan O. Arik, Jinsung Yoon, Xiang Zhang, Kihyuk Sohn, Tomas Pfister

Dataset Distillation with Infinitely Wide Convolutional Networks
Timothy Nguyen*, Roman Novak, Lechao Xiao, Jaehoon Lee

Deep Synoptic Monte-Carlo Planning in Reconnaissance Blind Chess
Gregory Clark

Differentially Private Learning with Adaptive Clipping
Galen Andrew, Om Thakkar, Swaroop Ramaswamy, Hugh Brendan McMahan

Differentially Private Model Personalization
Prateek Jain, Keith Rush, Adam Smith, Shuang Song, Abhradeep Thakurta

Efficient Algorithms for Learning Depth-2 Neural Networks with General ReLU Activations
Pranjal Awasthi, Alex Tang, Aravindan Vijayaraghavan

Efficiently Identifying Task Groupings for Multi-Task Learning
Christopher Fifty, Ehsan Amid, Zhe Zhao, Tianhe Yu, Rohan Anil, Chelsea Finn

Generalized Shape Metrics on Neural Representations
Alex H. Williams, Erin Kunz, Simon Kornblith, Scott Linderman

High-Probability Bounds for Non-Convex Stochastic Optimization with Heavy Tails
Ashok Cutkosky, Harsh Mehta

Identity Testing for Mallows Model
Róbert Busa-Fekete, Dimitris Fotakis, Balázs Szörényi, Manolis Zampetakis

Learnable Fourier Features for Multi-dimensional Spatial Positional Encoding
Yang Li, Si Si, Gang Li, Cho-Jui Hsieh, Samy Bengio*

Learning to Select Exogenous Events for Marked Temporal Point Process
Ping Zhang, Rishabh K. Iyer, Ashish V. Tendulkar, Gaurav Aggarwal, Abir De

Meta-learning to Improve Pre-training
Aniruddh Raghu, Jonathan Peter Lorraine, Simon Kornblith, Matthew B.A. McDermott, David Duvenaud

Pointwise Bounds for Distribution Estimation Under Communication Constraints
Wei-Ning Chen, Peter Kairouz, Ayfer Özgür

REMIPS: Physically Consistent 3D Reconstruction of Multiple Interacting People Under Weak Supervision
Mihai Fieraru, Mihai Zanfir, Teodor Alexandru Szente, Eduard Gabriel Bazavan, Vlad Olaru, Cristian Sminchisescu

Replacing Rewards with Examples: Example-Based Policy Search via Recursive Classification
Benjamin Eysenbach, Sergey Levine, Ruslan Salakhutdinov

Revealing and Protecting Labels in Distributed Training
Trung Dang, Om Thakkar, Swaroop Ramaswamy, Rajiv Mathews, Peter Chin, Françoise Beaufays

Robust Predictable Control
Benjamin Eysenbach, Ruslan Salakhutdinov, Sergey Levine

Robust Visual Reasoning via Language Guided Neural Module Networks
Arjun Reddy Akula, Varun Jampani, Soravit Changpinyo, Song-Chun Zhu

Towards Understanding Retrosynthesis by Energy-Based Models
Ruoxi Sun, Hanjun Dai, Li Li, Steven Kearnes, Bo Dai

Exploring the Limits of Out-of-Distribution Detection
Stanislav Fort, Jie Ren, Balaji Lakshminarayanan

Minimax Regret for Stochastic Shortest Path
Alon Cohen, Yonathan Efroni, Yishay Mansour, Aviv Rosenberg

No Regrets for Learning the Prior in Bandits
Soumya Basu, Branislav Kveton, Manzil Zaheer, Csaba Szepesvari

Structured Denoising Diffusion Models in Discrete State-Spaces
Jacob Austin, Daniel D. Johnsonv, Jonathan Ho, Daniel Tarlow, Rianne van den Berg

The Sensory Neuron as a Transformer: Permutation-Invariant Neural Networks for Reinforcement Learning (see blog post)
Yujin Tang, David Ha

On the Existence of The Adversarial Bayes Classifier
Pranjal Awasthi, Natalie Frank, Mehyrar Mohri

Beyond Value-Function Gaps: Improved Instance-Dependent Regret Bounds for Episodic Reinforcement Learning
Christopher Dann, Teodor Vanislavov Marinov, Mehryar Mohri, Julian Zimmert

A Provably Efficient Model-Free Posterior Sampling Method for Episodic Reinforcement Learning
Christopher Dann, Mehryar Mohri, Tong Zhang, Julian Zimmert

Datasets & Benchmarks Accepted Papers

Reduced, Reused and Recycled: The Life of a Dataset in Machine Learning Research
Bernard Koch, Emily Denton, Alex Hanna, Jacob G. Foster
Datasets & Benchmarks Best Paper

Constructing a Visual Dataset to Study the Effects of Spatial Apartheid in South Africa
Raesetje Sefala, Timnit Gebru, Luzango Mfupe, Nyalleng Moorosi

AI and the Everything in the Whole Wide World Benchmark
Inioluwa Deborah Raji, Emily M. Bender, Amandalynne Paullada, Emily Denton, Alex Hannah

A Unified Few-Shot Classification Benchmark to Compare Transfer and Meta Learning Approaches
Vincent Dumoulin, Neil Houlsby, Utku Evci, Xiaohua Zhai, Ross Goroshin, Sylvain Gelly, Hugo Larochelle

The Neural MMO Platform for Massively Multi-agent Research
Joseph Suarez, Yilun Du, Clare Zhu, Igor Mordatch, Phillip Isola

Systematic Evaluation of Causal Discovery in Visual Model-Based Reinforcement Learning
Nan Rosemary Ke, Aniket Didolkar, Sarthak Mittal, Anirudh Goyal, Guillaume Lajole, Stefan Bauer, Danilo Rezende, Yoshua Bengio, Michael Mozer, Christopher Pal

STEP: Segmenting and Tracking Every Pixel
Mark Weber, Jun Xie, Maxwell Collins, Yukun Zhu, Paul Voigtlaender, Hartwig Adam, Bradley Green, Andreas Geiger, Bastian Leibe, Daneil Cremers, Aljosa Osep, Laura Leal-Taixe, Liang-Chieh Chen

Artsheets for Art Datasets
Ramya Srinivisan, Emily Denton, Jordan Famularo, Negar Rostamzadeh, Fernando Diaz, Beth Coleman

SynthBio: A Case in Human–AI Collaborative Curation of Text Datasets
Ann Yuan, Daphne Ippolito, Vitaly Niolaev, Chris Callison-Burch, Andy Coenen, Sebastian Gehrmann

Benchmarking Bayesian Deep Learning on Diabetic Retinopathy Detection Tasks
Neil Band, Tim G. J. Rudner, Qixuan Feng, Angelos Filos, Zachary Nado, Michael W. Dusenberry, Ghassen Jerfel, Dustin Tran, Yarin Gal

Brax – A Differentiable Physics Engine for Large Scale Rigid Body Simulation (see blog post)
C. Daniel Freeman, Erik Frey, Anton Raichuk, Sertan Girgin, Igor Mordatch, Olivier Bachem

MLPerf Tiny Benchmark
Colby Banbury, Vijay Janapa Reddi, Peter Torelli, Jeremy Holleman, Nat Jeffries, Csaba Kiraly, Pietro Montino, David Kanter, Sebastian Ahmed, Danilo Pau, Urmish Thakker, Antonio Torrini, Peter Warden, Jay Cordaro, Giuseppe Di Guglielmo, Javier Duarte, Stephen Gibellini, Videet Parekh, Honson Tran, Nhan Tran, Niu Wenxu, Xu Xuesong

Automatic Construction of Evaluation Suites for Natural Language Generation Datasets
Simon Mille, Kaustubh D. Dhole, Saad Mahamood, Laura Perez-Beltrachini, Varun Gangal, Mihir Kale, Emiel van Miltenburg, Sebastian Gehrmann

An Empirical Investigation of Representation Learning for Imitation
Xin Chen, Sam Toyer, Cody Wild, Scott Emmons, Ian Fischer, Kuang-Huei Lee, Neel Alex, Steven Wang, Ping Luo, Stuart Russell, Pieter Abbeel, Rohin Shah

Multilingual Spoken Words Corpus
Mark Mazumder, Sharad Chitlangia, Colby Banbury, Yiping Kang, Juan Manuel Ciro, Keith Achorn, Daniel Galvez, Mark Sabini, Peter Mattson, David Kanter, Greg Diamos, Pete Warden, Josh Meyer, Vijay Janapa Reddi

Workshops

4th Robot Learning Workshop: Self-Supervised and Lifelong Learning
Sponsor: Google
Organizers include Alex Bewley, Vincent Vanhoucke

Differentiable Programming Workshop
Sponsor: Google

Machine Learning for Creativity and Design
Sponsor: Google
Organizers include: Daphne Ippolito, David Ha

LatinX in AI (LXAI) Research @ NeurIPS 2021
Sponsor: Google
Sponsorship Level: Platinum
Workshop Chairs include: Andres Munoz Medina
Mentorship Roundtables include: Jonathan Huang, Pablo Samuel Castro

Algorithmic Fairness Through the Lens of Causality and Robustness
Organizers include: Jessica Schrouff, Awa Dieng

ImageNet: Past, Present, and Future
Organizers include: Lucas Beyer, Xiaohua Zhai
Speakers include: Emily Denton, Vittorio Ferrari, Alex Hanna, Alex Kolesnikov, Rebecca Roelofs

Optimal Transport and Machine Learning
Organizers include: Marco Cuturi

Safe and Robust Control of Uncertain Systems
Speakers include: Aleksandra Faust

CtrlGen: Controllable Generative Modeling in Language and Vision
Speakers include: Sebastian Gehrmann

Deep Reinforcement Learning
Organizers include: Chelsea Finn
Speakers include: Karol Hausam, Dale Schuurmans

Distribution Shifts: Connecting Methods and Applications (DistShift)
Speakers include: Chelsea Finn

ML For Systems
Organizers include: Anna Goldie, Martin Maas, Azade Nazi, Azalia Mihoseini, Milad Hashemi, Kevin Swersky

Learning in Presence of Strategic Behavior
Organizers include: Yishay Mansour

Bayesian Deep Learning
Organizers include: Zoubin Ghahramani, Kevin Murphy

Advances in Programming Languages and Neurosymbolic Systems (AIPLANS)
Organizers include: Disha Shrivastava, Vaibhav Tulsyan, Danny Tarlow

Ecological Theory of Reinforcement Learning: How Does Task Design Influence Agent Learning?
Organizers include: Shixiang Shane Gu, Pablo Samuel Castro, Marc G. Bellemare

The Symbiosis of Deep Learning and Differential Equations
Organizers include: Lily Hu

Out-of-Distribution Generalization and Adaptation in Natural and Artificial Intelligence
Speakers include: Chelsea Finn

Cooperative AI
Organizers include: Natasha Jaques

Offline Reinforcement Learning
Organizers include: Rishabh Agarwal, George Tucker
Speakers include: Minmin Chen

2nd Workshop on Self-Supervised Learning: Theory and Practice
Organizers include: Kristina Toutanova

Data Centric AI
Organizers include: Lora Aroyo

Math AI for Education (MATHAI4ED): Bridging the Gap Between Research and Smart Education
Organizers include: Yuhai (Tony) Wu

Tutorials

Beyond Fairness in Machine Learning
Organizers include: Emily Denton

Competitions

Evaluating Approximate Inference in Bayesian Deep Learning
Organizers include: Matthew D. Hoffman, Sharad Vikram

HEAR 2021 NeurIPS Challenge Holistic Evaluation of Audio Representations
Organizers include: Jesse Engel

Machine Learning for Combinatorial Optimization
Organizers include: Pawel Lichocki, Miles Lubin



*Work done while at Google.  

Currently at Google.  

Categories
Misc

Deep Learning Detects Earthquakes at Millimeter-Scale

Image of a destroyed residential area after an earthquake in Japan.Researchers create a neural network that automatically detects tectonic fault deformation, crucial to understanding and possibly predicting earthquake behavior.Image of a destroyed residential area after an earthquake in Japan.

Researchers at Los Alamos National Laboratory in New Mexico are working toward earthquake detection with a new machine learning algorithm capable of global monitoring. The study uses Interferometric Synthetic Aperture Radar (InSAR) satellite data to detect slow-slip earthquakes. The work will help scientists gain a deeper understanding of the interplay between slow and fast earthquakes, which could be key to making future predictions of quake events.

“Applying machine learning to InSAR data gives us a new way to understand the physics behind tectonic faults and earthquakes,” Bertrand Rouet-Leduc, a geophysicist in Los Alamos’ Geophysics group said in a press release. “That’s crucial to understanding the full spectrum of earthquake behavior.”

Discovered a couple of decades ago, slow earthquakes remain a bit of a mystery. They occur at the boundary between plates and can last from days to months without detection due to their slow and quiet nature.

They typically happen in areas where faults are locked due to frictional resistance, and scientists believe they may precede major fast quakes. Japan’s 9.0 magnitude earthquake in 2011, which also caused a tsunami and the Fukushima nuclear disaster, followed two slow earthquakes along the Japan Trench.

Scientists can track earthquake behavior with InSAR satellite data. The radar waves have the benefit of penetrating clouds and also work effectively at night, making it possible to track ground deformation continuously. Comparing radar images over time, researchers can detect ground surface movement.

But these movements are small, and existing approaches limit ground deformation measurements to a few centimeters. Ongoing monitoring of global fault systems also creates massive data streams that are too much to interpret manually.

The researchers created deep learning models addressing both of these limitations. The team trained convolutional neural networks on several million time series of synthetic InSAR data to detect automatically and extract ground deformation. 

Using cuDNN-accelerated TensorFlow deep learning framework distributed over multiple NVIDIA GPUs, the new methodology operates without prior knowledge of a fault’s location or slip behavior.

3 graphics showing the Anatolian fault, a raw time series and time series deformation detection.
Figure 1. Application to real data shows the North Anatolian Fault 2013 slow earthquake.

To test their approach, they applied the algorithm to a time series built from images of the North Anatolian fault in Turkey. As a major plate boundary fault, the area has ruptured several times in the past century.

With a finer temporal resolution, the algorithm identified previously undetected slippage events, showing that slow earthquakes happen much more often than expected. It also spotted movement as small as two millimeters, something experts would have overlooked due to the subtlety.

“The use of deep learning unlocks the detection on faults of deformation events an order of magnitude smaller than previously achieved manually. Observing many more slow slip events may, in turn, unveil their interaction with regular, dynamic earthquakes, including the potential nucleation of earthquakes with slow deformation,” Rouet-Leduc said.

The team is currently working on a follow-up study, testing a model on the San Andreas Fault that extends roughly 750 miles through California. According to Rouet-Leduc, the model will soon be available on GitHub.


Read the published research in Nature Communications. >>
Read the press release. >>

Categories
Misc

Navigating the Global Supply Chain with Networking Digital Twins

Supply chain shortages are impacting many industries, with semiconductors feeling the crunch in particular. With networking digital twins, you don’t have to wait on the hardware. Get started with infrastructure simulation in NVIDIA Air to stage deployments, test out tools, and enable hardware-free training.

What do Ethernet switches, sports cars, household appliances, and toilet paper have in common?  If you read this blog’s title and have lived through the past year and a half, you probably know the answer. These are all products whose availability has been impacted by the materials shortages due to the global pandemic.

In some instances, the supply issues are more of an inconvenience–waiting a few extra months to get that new Corvette won’t be the end of the world. For other products (think toilet paper or a replacement freezer), the supply crunch was and is a big deal.

It is easy to see the impact on consumers, but enterprises feel the pain of long lead times too. Consider Ethernet switches: Ethernet switches build the networking fabric that ties together the data center. Ethernet switch shortages mean more than “rack A is unable to talk to rack B.” They mean decreased aggregate throughput, and increased load on existing infrastructure, leading to more downtime and unplanned outages; that is, significant adverse impacts to business outcomes.

That all sounds bad, but there is no need to panic. NVIDIA can help you mitigate these challenges and transform your operations with a data center digital twin from NVIDIA Air.

So, what is a digital twin, and how is it related to the data center? A digital twin is a software-simulated replica of a real-world thing, system, or process. It constantly reacts and updates any changes to the status of its physical sibling and is always on. A data center digital twin applies the digital twin concept to data center infrastructure. To model the data center itself as a data center and not just a bunch of disparate pizza boxes, it is imperative that the data center digital twin fully simulates the network.

NVIDIA Air is unmatched in providing that capability. The modeling tool in Air enables you to create logical instances of every switch and cable, connecting to logical server instances. In addition to modeling the hardware, NVIDIA Air spins up fully functional virtual appliances with pre-built and fully functional network and server OS images. This is the key ingredient to the digital twin–with an appliance model, the simulation is application-granular.

Benefits

NVIDIA Air enables data center digital twins, but how does that solve supply chain issues? Focusing on those benefits tied to hardware, in particular, it enables:

  • Hardware-free POCs: Want exposure to the Cumulus Linux or SONiC NOSes? Ordinarily, you would have to acquire the gear to try out the functionality. With NVIDIA Air, you have access to Cumulus VX and SONiC VX–the virtual appliances mentioned above. Because Cumulus and SONiC are built from the ground up on standards-based technologies, you get the full experience without the hardware.
  • Staging production deployments: Already decided on NVIDIA Ethernet switches? There is no reason to sit on your hands until the pallet of switches arrives. With a digital twin, you can completely map out your data center fabric. You can test your deployment and provisioning scripts and know that they will work seamlessly after the systems have been racked, stacked, and cabled. This can reduce your bring-up time up to 95%.
  • Testing out new network and application tools: Need to roll out a new networking tool on your Spectrum Ethernet switches? Typically, you would need a prototype pre-production environment. With a digital twin, you deploy the application to the digital twin, validate the impact on your network with NetQ, tweak some settings if necessary, and make deployment to production worry-free.
  • Hardware-free training: Your organization has decided to bring on someone new to join your networking infrastructure team. They are eager to learn, but there is no hardware set aside for training purposes. Without a digital twin, you and the trainee would be stuck waiting on a new switch order or reading a long and tedious user manual. With the digital twin, you have an always-on sandbox, perfect for skill-building and exploration.

One caveat: data center digital twins will not expedite the date that the RTX 3090 comes back in stock at your favorite retailer, but they will help with the crunch around your networking procurement.

NVIDIA Air allows you to view a digital twin of your physical network
Digital Twins with NVIDIA Air

The best part – if you are curious to learn more, you can do so right now. NVIDIA Air brings the public cloud experience to on-premises networking, making it simple and quick to jump right in. Navigate to NVIDIA Air in your browser and get started immediately.

Categories
Misc

Model for detecting deer

Hello! I’m a long time developer but new to AI-based image processing. The end goal is to process images from cameras and alert when deer (and eventually other wildlife) is detected.

The first step is finding a decent model that can (say) detect deer vs. birds vs. other animals, then running that somewhere. The default The CameraTraps model here allows detecting “animal” vs. “person” vs. “vehicle”:

https://github.com/microsoft/CameraTraps/blob/master/megadetector.md

Would I need to train it further to differentiate between types of animals, or am I missing something with the default model? Or a more general question, how can you see what a frozen model is set up to detect? (I just learned what a frozen model was yesterday)

Appreciate any pointers or if there’s another sub that would be more suited to getting this project setup, happy to post there instead 🙂

submitted by /u/brianhogg
[visit reddit] [comments]

Categories
Misc

TensorFlow workshop

TensorFlow workshop submitted by /u/alphapeeler
[visit reddit] [comments]
Categories
Misc

Transforming the Future of Mobility at ITS America with NVIDIA Metropolis Partners

A digitalized street with cars, bikes and pedestrians.Explore NVIDIA Metropolis partners showcasing new technologies to improve city mobility at the ITS America 2021.A digitalized street with cars, bikes and pedestrians.

The Intelligent Transportation Society (ITS) of America annual conference brings together a community of intelligent transportation professionals to network, educate others about emerging technologies, and demonstrate innovative products driving the future of efficient and safe transportation.

As cities and DOT teams struggle with constrained roadway infrastructure and the need to build safer roads, events like this offer solutions and reveal a peek into the future. The NVIDIA Metropolis video analytics platform is increasingly being used by cities, DOTs, tollways, and developers to help measure, automate, and vastly improve the efficiency and safety of roadways around the world. 

The following NVIDIA Metropolis partners are participating at ITS-America and showcasing how they help cities improve livability and safety.

Miovision:  Arguably one of the first in building superhuman levels of computer vision into intersections, Miovison will explain how their technology is transforming traffic intersections, giving cities and towns more effective tools to manage traffic congestion, improving traffic safety, and reducing the impact of traffic on greenhouse gas emissions. Check out Miovision at booth #1619.

NoTraffic: NoTraffic’s real-time, plug-and-play autonomous traffic management platform uses AI and cloud computing to reinvent how cities run their transport networks. The NoTraffic platform is an end-to-end hardware and software solution installed at intersections, transforming roadways to optimize traffic flows and reduce accidents. Check out NoTraffic at booth #1001.

Ouster: Cities are using Ouster digital lidar solutions capable of capturing the environment in minute detail and detecting vehicles, vulnerable road users, and traffic incidents in real time to improve safety and traffic efficiency. Ouster lidar’s 3D spatial awareness and 24/7 performance combine the high-resolution imagery of cameras with the all-weather reliability of radar. Check out Ouster and a live demo at booth #2012.

Parsons: Parsons is a leading technology firm driving the future of smart infrastructure. Parsons develops advanced traffic management systems that cities use to improve safety, mobility, and livability. Check out Parsons at booth #1818.

Velodyne Lidar: Velodyne’s lidar-based Intelligent Infrastructure Solution (IIS) is a complete end-to-end Smart City solution. IIS creates a real-time 3D map of roads and intersections, providing precise traffic and pedestrian safety analytics, road user classification, and smart signal actuation. The solution is deployed in the US, Canada and across EMEA and APAC. Learn more about Velodyne’s on-the-ground deployments at their panel talk

Register for ITS America, happening December 7-10 in Charlotte, NC.

Promo banner of ITS America 2021.
Figure 1. ITS America 2021 promo.
Categories
Misc

Creating Custom, Production-Ready AI Models Faster with NVIDIA TAO

Learn about the latest updates to NVIDIA TAO, an AI-model-adaptation framework, and NVIDIA TAO toolkit, a CLI and Jupyter notebook-based version of TAO.

All AI applications are powered by models. Models can help spot defects in parts, detect the early onset of disease, translate languages, and much more. But building custom models for a specific use requires mountains of data and an army of data scientists. 

NVIDIA TAO, an AI-model-adaptation framework, simplifies and accelerates the creation of AI models. By fine-tuning state-of-the-art, pretrained models, you can create custom, production-ready computer vision and conversational AI models. This can be done in hours rather than months, eliminating the need for large training data or AI expertise.

The latest version of the TAO toolkit is now available for download. The TAO toolkit, a CLI and Jupyter notebook-based version of TAO, brings together several new capabilities to help you speed up your model creation process. 

Key highlights 

We are also taking TAO to the next level and making it a lot easier to create custom, production-ready models. A graphical user interface version of TAO is currently under development that epitomizes a zero-code model development solution. This creates the ability to train, adapt, and optimize computer vision and conversational AI models without writing a single line of code. 



Early access is slated for early 2022. Sign up today!

Categories
Offsites

Evaluating Syntactic Abilities of Language Models

In recent years, pre-trained language models, such as BERT and GPT-3, have seen widespread use in natural language processing (NLP). By training on large volumes of text, language models acquire broad knowledge about the world, achieving strong performance on various NLP benchmarks. These models, however, are often opaque in that it may not be clear why they perform so well, which limits further hypothesis-driven improvement of the models. Hence, a new line of scientific inquiry has arisen: what linguistic knowledge is contained in these models?

While there are many types of linguistic knowledge that one may want to investigate, a topic that provides a strong basis for analysis is the subject–verb agreement grammar rule in English, which requires that the grammatical number of a verb agree with that of the subject. For example, the sentence “The dogs run.” is grammatical because “dogs” and “run” are both plural, but “The dogs runs.” is ungrammatical because “runs” is a singular verb.

One framework for assessing the linguistic knowledge of a language model is targeted syntactic evaluation (TSE), in which minimally different pairs of sentences, one grammatical and one ungrammatical, are shown to a model, and the model must determine which one is grammatical. TSE can be used to test knowledge of the English subject–verb agreement rule by having the model judge between two versions of the same sentence: one where a particular verb is written in its singular form, and the other in which the verb is written in its plural form.

With the above context, in “Frequency Effects on Syntactic Rule-Learning in Transformers”, published at EMNLP 2021, we investigated how a BERT model’s ability to correctly apply the English subject–verb agreement rule is affected by the number of times the words are seen by the model during pre-training. To test specific conditions, we pre-trained BERT models from scratch using carefully controlled datasets. We found that BERT achieves good performance on subject–verb pairs that do not appear together in the pre-training data, which indicates that it does learn to apply subject–verb agreement. However, the model tends to predict the incorrect form when it is much more frequent than the correct form, indicating that BERT does not treat grammatical agreement as a rule that must be followed. These results help us to better understand the strengths and limitations of pre-trained language models.

Prior Work
Previous work used TSE to measure English subject–verb agreement ability in a BERT model. In this setup, BERT performs a fill-in-the-blank task (e.g., “the dog _ across the park”) by assigning probabilities to both the singular and plural forms of a given verb (e.g., “runs” and “run”). If the model has correctly learned to apply the subject–verb agreement rule, then it should consistently assign higher probabilities to the verb forms that make the sentences grammatically correct.

This previous work evaluated BERT using both natural sentences (drawn from Wikipedia) and nonce sentences, which are artificially constructed to be grammatically valid but semantically nonsensical, such as Noam Chomsky’s famous example “colorless green ideas sleep furiously”. Nonce sentences are useful when testing syntactic abilities because the model cannot just fall back on superficial corpus statistics: for example, while “dogs run” is much more common than “dogs runs”, “dogs publish” and “dogs publishes” will both be very rare, so a model is not likely to have simply memorized the fact that one of them is more likely than the other.

BERT achieves an accuracy of more than 80% on nonce sentences (far better than the random-chance baseline of 50%), which was taken as evidence that the model had learned to apply the subject–verb agreement rule. In our paper, we went beyond this previous work by pre-training BERT models under specific data conditions, allowing us to dig deeper into these results to see how certain patterns in the pre-training data affect performance.

Unseen Subject–Verb Pairs
We first looked at how well the model performs on subject–verb pairs that were seen during pre-training, versus examples in which the subject and verb were never seen together in the same sentence:

BERT’s error rate on natural and nonce evaluation sentences, stratified by whether a particular subject–verb (SV) pair was seen in the same sentence during training or not. BERT’s performance on unseen SV pairs is far better than simple heuristics such as picking the more frequent verb or picking the more frequent SV pair.

BERT’s error rate increases slightly for unseen subject–verb (SV) pairs, for both natural and nonce evaluation sentences, but it is still much better than naïve heuristics, such as picking the verb form that occurred more often in the pre-training data or picking the verb form that occurred more frequently with the subject noun. This tells us that BERT is not just reflecting back the things that it sees during pre-training: making decisions based on more than just raw frequencies and generalizing to novel subject–verb pairs are indications that the model has learned to apply some underlying rule concerning subject–verb agreement.

Frequency of Verbs
Next, we went beyond just seen versus unseen, and examined how the frequency of a word affects BERT’s ability to use it correctly with the subject–verb agreement rule. For this study, we chose a set of 60 verbs, and then created several versions of the pre-training data, each engineered to contain the 60 verbs at a specific frequency, ensuring that the singular and plural forms appeared the same number of times. We then trained BERT models from these different datasets and evaluated them on the subject–verb agreement task:

BERT’s ability to follow the subject–verb agreement rule depends on the frequency of verbs in the training set.

These results indicate that although BERT is able to model the subject–verb agreement rule, it needs to see a verb about 100 times before it can reliably use it with the rule.

Relative Frequency Between Verb Forms
Finally, we wanted to understand how the relative frequencies of the singular and plural forms of a verb affect BERT’s predictions. For example, if one form of the verb (e.g., “combat”) appeared in the pre-training data much more frequently than the other verb form (e.g., “combats”), then BERT might be more likely to assign a high probability to the more frequent form, even when it is grammatically incorrect. To evaluate this, we again used the same 60 verbs, but this time we created manipulated versions of the pre-training data where the frequency ratio between verb forms varied from 1:1 to 100:1. The figure below shows BERT’s performance for these varying levels of frequency imbalance:

As the frequency ratio between verb forms in training data becomes more imbalanced, BERT’s ability to use those verbs grammatically decreases.

These results show that BERT achieves good accuracy at predicting the correct verb form when the two forms are seen the same number of times during pre-training, but the results become worse as the imbalance between the frequencies increases. This implies that even though BERT has learned how to apply subject–verb agreement, it does not necessarily use it as a “rule”, instead preferring to predict high-frequency words regardless of whether they violate the subject–verb agreement constraint.

Conclusions
Using TSE to evaluate the performance of BERT reveals its linguistic abilities on syntactic tasks. Moreover, studying its syntactic ability in relation to how often words appear in the training dataset reveals the ways that BERT handles competing priorities — it knows that subjects and verbs should agree and that high frequency words are more likely, but doesn’t understand that agreement is a rule that must be followed and that the frequency is only a preference. We hope this work provides new insight into how language models reflect properties of the datasets on which they are trained.

Acknowledgements
It was a privilege to collaborate with Tal Linzen and Ellie Pavlick on this project.

Categories
Misc

Optimizing T5 and GPT-2 for Real-Time Inference with NVIDIA TensorRT

TensorRT 8.2 optimizes HuggingFace T5 and GPT-2 models. With TensorRT-accelerated GPT-2 and T5, you can generate excellent human-like texts and build real-time translation, summarization, and other online NLP applications within strict latency requirements.

The transformer architecture has wholly transformed (pun intended) the domain of natural language processing (NLP). Over the recent years, many novel network architectures have been built on the transformer building blocks: BERT, GPT, and T5, to name a few. With increasing variety, the size of these models has also rapidly increased.

While larger neural language models generally yield better results, deploying them for production poses serious challenges, especially for online applications where a few tens of ms of extra latency can negatively affect the user experience significantly.

With the latest TensorRT 8.2, we optimized T5 and GPT-2 models for real-time inference. You can turn the T5 or GPT-2 models into a TensorRT engine, and then use this engine as a plug-in replacement for the original PyTorch model in the inference workflow. This optimization leads to a 3–6x reduction in latency compared to PyTorch GPU inference, and a 9–21x compared to PyTorch CPU inference.

In this post, we give you a detailed walkthrough of how to achieve the same latency reduction, using our newly published example scripts and notebooks based on Hugging Face transformers for the tasks of open-end text generation with GPT-2 and translation and summarization with T5.

Introduction to T5 and GPT-2

In this section, we briefly explain the T5 and GPT-2 models.

T5 for answering questions, summarization, translation, and classification

T5 or Text-To-Text Transfer Transformer is a recent architecture created by Google. It reframes all natural language processing (NLP) tasks into a unified text-to-text format where the input and output are always text strings. T5’s architecture enables applying the same model, loss function, and hyperparameters to any NLP task such as machine translation, document summarization, question answering, and classification tasks such as sentiment analysis.

The T5 model was inspired by the fact that transfer learning has produced state-of-the-art results in NLP. The principle behind transfer learning is that a model pretrained on abundantly available untrained data with self-supervised tasks can be fine-tuned for specific tasks on smaller task-specific labeled datasets. These models have proven to have better results than models trained on task-specific datasets from scratch.

Based on the concept of Transfer Learning, Google proposed the T5 model in Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. In this paper, they also introduced the Colossal Clean Crawled Corpus (C4) dataset. The T5 model, pretrained on this dataset achieves state-of-the-art results on many downstream NLP tasks. Published pretrained T5 models range up to 3B and 11B parameters.

GPT-2 for generating excellent human-like texts

Generative Pre-Trained Transformer 2 (GPT-2) is an auto-regressive unsupervised language model originally proposed by OpenAI. It is built from the transformer decoder blocks and trained on very large text corpora to predict the next word in a paragraph. It generates excellent human-like texts. Larger GPT-2 models, with the largest reaching 1.5B parameters, generally write better, more coherent texts.

Deploying T5 and GPT-2 with TensorRT

With TensorRT 8.2, we optimize the T5 and GPT-2 models by building and using a TensorRT engine as a drop-in replacement for the original PyTorch model. We walk you through scripts and Jupyter notebooks and highlight the important bits, which are based on Hugging Face transformers. For more information, see the example scripts and notebooks for a detailed step-by-step execution guide.

Setting up

The most convenient way to get started is by using a Docker container, which provides an isolated, self-contained, and reproducible environment for the experiments.

Build and launch a TensorRT container:

git clone -b master https://github.com/nvidia/TensorRT TensorRgit clone -b master https://github.com/nvidia/TensorRT TensorRT
cd TensorRT
git checkout release/8.2
git submodule update --init --recursive

./docker/build.sh --file docker/ubuntu-18.04.Dockerfile --tag tensorrt-ubuntu18.04-cuda11.4
./docker/launch.sh --tag tensorrt-ubuntu18.04-cuda11.4 --gpus all --jupyter 8888

These commands start the Docker container and JupyterLab. Open the JupyterLab interface in your web browser:

http://:8888/lab/

In JupyterLab, to open a terminal window, choose File, New, Terminal. Compile and install the TensorRT OSS package:

cd $TRT_OSSPATH
mkdir -p build && cd build
cmake .. -DTRT_LIB_DIR=$TRT_LIBPATH -DTRT_OUT_DIR=`pwd`/out
make -j$(nproc)

Now you are ready to proceed with experimenting with the models. In the following sequence, we demonstrate the steps for the T5 model. The following code blocks are not meant to be copy-paste runnable but rather walk you through the process. For reproduction purposes, see the notebooks on the GitHub repository.

At a high level, optimizing a Hugging Face T5 and GPT-2 model with TensorRT for deployment is a three-step process:

  1. Download models from the HuggingFace model zoo.
  2. Convert the model to an optimized TensorRT execution engine.
  3. Carry out inference with the TensorRT engine.

Use the generated engine as a plug-in replacement for the original PyTorch model in the HuggingFace inference workflow.

Download models from the HuggingFace model zoo

First, download the original Hugging Face PyTorch T5 model from HuggingFace model hub, together with its associated tokenizer.

T5_VARIANT = 't5-small'

t5_model = T5ForConditionalGeneration.from_pretrained(T5_VARIANT)
tokenizer = T5Tokenizer.from_pretrained(T5_VARIANT)
config = T5Config(T5_VARIANT)

You can then employ this model for various NLP tasks, for example, translating from English to German:

print(tokenizer.decode(outputs[0], skip_special_tokens=Truinputs = tokenizer("translate English to German: That is good.", return_tensors="pt")

# Generate sequence for an input
outputs = t5_model.to('cuda:0').generate(inputs.input_ids.to('cuda:0'))
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

TensorRT 8.2 supports GPT-2 up to the “xl” version (1.5B parameters) and T5 up to 11B parameters, which are publicly available on the HuggingFace model zoo. Larger models can also be supported subject to GPU memory availability.

Converting the model to an optimized TensorRT execution engine.

Before converting the model to a TensorRT engine, you convert the PyTorch model to an intermediate universal format. ONNX is an open format for machine learning and deep learning models. It enables you to convert deep learning and machine-learning models from different frameworks such as TensorFlow, PyTorch, MATLAB, Caffe, and Keras to a single unified format.

Converting to ONNX

For the T5 model, convert the encoder and decoder separately using a utility function.

encoder_onnx_model_fpath = T5_VARIANT + "-encoder.onnx"
decoder_onnx_model_fpath = T5_VARIANT + "-decoder-with-lm-head.onnx"

t5_encoder = T5EncoderTorchFile(t5_model.to('cpu'), metadata)
t5_decoder = T5DecoderTorchFile(t5_model.to('cpu'), metadata)

onnx_t5_encoder = t5_encoder.as_onnx_model(
    os.path.join(onnx_model_path, encoder_onnx_model_fpath), force_overwrite=False
)
onnx_t5_decoder = t5_decoder.as_onnx_model(
    os.path.join(onnx_model_path, decoder_onnx_model_fpath), force_overwrite=False
)

Converting to TensorRT

Now you are ready to parse the T5 ONNX encoder and decoder and convert them to optimized TensorRT engines. As TensorRT carries out many optimizations, such as fusing operations, eliminating transpose operations, and kernel auto-tuning to find the best performing kernel on a target GPU architecture, this conversion process might take a while.

t5_trt_encoder_engine = T5EncoderONNXt5_trt_encoder_engine = T5EncoderONNXFile(
                os.path.join(onnx_model_path, encoder_onnx_model_fpath), metadata
            ).as_trt_engine(os.path.join(tensorrt_model_path, encoder_onnx_model_fpath) + ".engine")

t5_trt_decoder_engine = T5DecoderONNXFile(
                os.path.join(onnx_model_path, decoder_onnx_model_fpath), metadata
            ).as_trt_engine(os.path.join(tensorrt_model_path, decoder_onnx_model_fpath) + ".engine")

Carry out inference with the TensorRT engine

Finally, you now have an optimized TensorRT engine for the T5 model, ready to carry out inference.

t5_trt_encoder = T5TRTEncoder(
                t5_trt_encoder_engine, metadata, tfm_config
            )
t5_trt_decoder = T5TRTDecoder(
                t5_trt_decoder_engine, metadata, tfm_config
            )

#generate output
encoder_last_hidden_state = t5_trt_encoder(input_ids=input_ids)

outputs = t5_trt_decoder.greedy_search(
            input_ids=decoder_input_ids,
            encoder_hidden_states=encoder_last_hidden_state,
            stopping_criteria = StoppingCriteriaList([MaxLengthCriteria(max_length)])
        )
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Similarly, for the GPT-2 model, you can follow the same process to generate a TensorRT engine. The optimized TensorRT engines can be used as a plug-in replacement for the original PyTorch models in the HuggingFace inference workflow.

TensorRT transformer optimization specifics

Transformer-based models are a stack of either transformer encoder or decoder blocks. Encoder (decoder) blocks have the same architecture and number of parameters. T5 consists of stacks of transformer encoders and decoders, while GPT-2 is composed of only transformer decoder blocks (Figure 1).

T5 is composed of stacks of transformer encoders and decoders.
Figure 1a. T5 architecture
GPT-2 is composed of only transformer decoder blocks.
Figure 1b. GPT-2 architecture

Each transformer block, also known as the self-attention block, consists of three projections by using fully connected layers to project the input into three different subspaces, termed query (Q), key (K), and value (V). These matrices are then transposed, with QT and KT being used to compute the normalized dot-product attention values, before being combined with VT to produce the final output (Figure 2).

Self-attention block operations from input through three separated FC layers, to transpose, MUL, element scaling, and Softmax, to the attention layer output.
Figure 2. Self-attention block

TensorRT optimizes the self-attention block by pointwise layer fusion:

  • Reduction is fused with power ops (for LayerNorm and residual-add layer).
  • Scale is fused with softmax.
  • GEMM is fused with ReLU/GELU activations.

Additionally, TensorRT also optimizes the network for inference:

  • Eliminating transpose ops.
  • Fusing the three KQV projections into a single GEMM.
  • When FP16 mode is specified, controlling layer-wise precisions to preserve accuracy while running the most compute-intensive ops in FP16.

TensorRT vs. PyTorch CPU and GPU benchmarks

With the optimizations carried out by TensorRT, we’re seeing up to 3–6x speedup over PyTorch GPU inference and up to 9–21x speedup over PyTorch CPU inference.

Figure 3 shows the inference results for the T5-3B model at batch size 1 for translating a short phrase from English to German. The TensorRT engine on an A100 GPU provides a 21x reduction in latency compared to PyTorch running on a dual-socket Intel Platinum 8380 CPU.

T5-3B model inference takes 31 ms with TensorRT on an A100 GPU and 656 ms with PyTorch on 2x Intel Platinum 8380 CPU.
Figure 3. T5-3B model inference comparison. TensorRT on A100 GPU provides a 21x smaller latency compared to PyTorch CPU inference.

CPU: Intel Platinum 8380, 2 sockets.
GPU: NVIDIA A100 PCI Express 80GB. Software: PyTorch 1.9, TensorRT 8.2.0 EA.
Task: “Translate English to German: that is good.”

Conclusion

In this post, we walked you through converting the Hugging Face PyTorch T5 and GPT-2 models to an optimized TensorRT engine for inference. The TensorRT inference engine is used as a drop-in replacement for the original HuggingFace T5 and GPT-2 PyTorch models and provides up to 21x CPU inference speedup. To achieve this speedup for your model, get started today with TensorRT 8.2.