Improving Machine Learning Security Skills at a DEF CON Competition

Letters, numbers, and padlocks on black backgroundMachine learning (ML) security is a new discipline focused on the security of machine learning systems and the data they are built upon. It exists at the…Letters, numbers, and padlocks on black background

Machine learning (ML) security is a new discipline focused on the security of machine learning systems and the data they are built upon. It exists at the intersection of the information security and data science domains. 

While the state-of-the-art moves forward, there is no clear onboarding and learning path for securing and testing machine learning systems. How, then, should interested practitioners begin developing machine learning security skills? You could read related articles on arXiv, but how about practical steps?

Competitions offer one promising opportunity. NVIDIA recently helped run an innovative ML security competition at the DEF CON 30 hacking and security conference. Hosted by AI Village, the competition drew more than 3,000 participants. It aimed to introduce attendees to each other and to the field of ML security. The competition proved to be a valuable opportunity for participants to develop and improve their machine learning security skills.

NVIDIA AI Red Team and AI Village

To proactively test and assess the security of NVIDIA machine learning offerings, the NVIDIA AI Red Team has been expanding. While this team consists of experienced security and data professionals, they recognized a need to develop ML security talent across the industry. With more exposure and education, data and security practitioners are likely to improve the security of their deployed machine learning systems.

AI Village is a community of data scientists and hackers working to educate on the topic of artificial intelligence (AI) in security and privacy. The community holds events at DEF CON each year. 

The NVIDIA AI Red Team and AI Village joined together at DEF CON 30 to engage the information security community with a machine learning security competition. The topic was potentially new to many attendees. Members of the AI Village created challenges designed to teach and test elements of ML security knowledge. In addition to NVIDIA, these members represented AWS Security, Orang Labs, and NetSec Explained

AI Village Capture the Flag Competition

Capture the Flag (CTF) competitions include multiple challenges. Competitors play through the challenges and collect flags for those that are completed successfully. These flags are assigned various point values based on the level of challenge. Competitors win by collecting the most points. 

With this familiar format in mind, the AI Village and NVIDIA AI Red Team built The AI Village CTF @ DEFCON. Organizers partnered with Kaggle to use a platform familiar to the machine learning community. Similar to information security CTFs, Kaggle competitions provide a format for ML researchers to compete on discrete problems. 

Partnering with Kaggle provided the competition with a flexible and scalable platform that paired compute and data hosting with documentation and scoring. Although the challenge servers are no longer active, you can view the challenge descriptions.

Competitors reported onboarding and moving through the challenges with ease, with minimal additional infrastructure required from the AI Village. Furthermore, Kaggle has a large audience of skilled data scientists and machine learning engineers who were excited to explore the security domain. Kaggle also generously offered ongoing support and $25,000 in prizes. We could not have asked for a better partner for this event.

Over the month-long competition, over 3,000 competitors hacked their way through 22 challenges. This far exceeded expectations and included participants from over 70 countries, from first-time Kagglers to Grandmasters. The event succeeded in bringing the traditional information security and machine learning communities together to tackle a range of challenges from this new domain of ML security. 

Competitors used publicly available tools and innovative technique applications such as open source research, masking, and dimensionality reduction. In the process, they often reimplemented attacks from academic literature as well. 

Because one challenge remained unsolved, there was always a chance for someone to rise to the top of the leaderboard. For the final two weeks of the competition, the Kaggle Discussion Board and AI Village Discord were abuzz with theories and explorations of the remaining unsolved challenge. The organizers were checking hourly for a buzzer-beating leaderboard shift. Check out the challenge solutions.

A graph showing player scores over the month-long AI Village Capture the Flag Competition. 
Figure 1. Player scores over the month-long AI Village Capture the Flag Competition 

Inference Challenge

In the Inference Challenge, participants had to execute a membership inference attack to identify training samples. They only had API access to an image classifier. When done successfully, the competitors would identify images that showed characters of the flag. 

Some competitors chose to randomly generate the images by permuting pixel values, effectively brute-forcing the problem. Other competitors assumed that the training data may have included a standard dataset and used EMNIST as their source data, leveraging open source data. Others made use of the Adversarial Robustness Toolbox, producing output similar to what is shown in Figure 2. 

Monochromatic pixels spelling ‘D3FC0N’
Figure 2. Example output from the Inference Challenge, which spells D3FCON

Whatever method used, a successful challenger would be rewarded with the flag, spelling D3FC0N. This leetspeak encoding of the DEF CON conference name was used in several places on the conference website. 

Crop2 Challenge

Research and out-of-the-box thinking often help to solve CTF challenges. For instance, the one unsolved challenge in the competition was Crop2. In Crop2, participants were given a poisoned cropping model and had to create the poisoned sample (within some error bounds). They had one training data example to work with (Figure 3). 

A multicolored 3x3 grid of circles in squares.
Figure 3. A sample training image provided to competitors

This is a difficult problem without an efficient, standard algorithmic solution. When you think about all of the pixels in an image and all of the possible pixel values across three color channels, the search space explodes to over 800 billion options. Instead, competitors could combine reverse engineering, open source research, and assumptions to reduce the number of combinations.

After the competition ended, organizers gave hints to help competitors solve the Crop2 Challenge. Some of the key hints included using open source research to determine that pixel colors likely were generated by matplotlib default colormaps. This greatly reduces the search space into the hundreds of thousands. 

By making these informed assumptions, one competitor was eventually able to reach the Crop2 Challenge solution. One trait of great hackers is tenacity: still working tirelessly after the competition ended, this competitor diligently worked through the provided hints. The competitor reported that a hint “helped me realize that we only needed to use nine colors. Mate, I’d been fiddling around with 16 million. This made the search space manageable.” 

Competitor notebooks

Check out some of our favorite notebooks from competitors:

  1. Chris Deotte – From a member of the Kaggle Grandmasters of NVIDIA (KGMoN), these solutions are very well organized and documented. We recommend Secret Sloth in particular.
  2. Eric Bouteillon – Watch the flag appear character-by-character in Excuse Me. Also notice the different solve techniques for the MATH challenges. Have you heard of silhouette score?
  3. John MacGillivray – John deduced that the Hotterdog model was based on MobileNet, enabling an offline attack. Great tradecraft.
  4. Fournierp – A comprehensive writeup about model inversion for the Inference Challenge, written from scratch. You can also check out the MIFace in the Adversarial Robustness Toolbox.
  5. Eoin O – Learn how you could have solved the Crop2 Challenge. More than 3,000 competitors tried to solve it for the greater part of a month. The day after the competition ended, organizers released several hints. Within a few hours, it was solved. It was great to see all of the competitors collaborating in Discord and the Kaggle Discussion Board after the competition ended.


The AI Village CTF @ DEF CON 30 competition showed that there is a significant appetite in both the security and data professions to improve machine learning security skills. As ML systems are deployed in increasingly security-critical contexts, it will become imperative to train professionals and develop tools and methods for security development, deployment, and testing. 

NVIDIA will continue driving innovation with a robust and secure ecosystem for AI, from embedded devices and laptops to supercomputers and the cloud. As part of this effort, our AI Red Team will empower ML security research and testing internally and establish security practices across the industry. We will host competitions, workshops, and release research and security tools in the future. If you’re interested in participating, contact us at

Additional resources


Explainer: What Is Denoising?

Denoising is an advanced technique used to decrease grainy spots and discoloration in images while minimizing the loss of quality.

Denoising is an advanced technique used to decrease grainy spots and discoloration in images while minimizing the loss of quality.


Designing an Optimal AI Inference Pipeline for Autonomous Driving

Self-driving cars must be able to detect objects quickly and accurately to ensure the safety of their drivers and other drivers on the road. Due to this need…

Self-driving cars must be able to detect objects quickly and accurately to ensure the safety of their drivers and other drivers on the road. Due to this need for real-time processing in autonomous driving (AD) and visual inspection use cases, multiple AI models with preprocessing and postprocessing logic are combined in a pipeline and used for machine learning (ML) inference.

Speedup is required in every step of the pipeline to ensure a low latency workflow. Latency is the time it takes to get the inference response. Faster processing of AD data will enable more efficient analysis and use of the information, creating a safer driving environment. A delay with any single aspect can slow down the entire pipeline. 

To achieve a low latency inference workflow, electric vehicle manufacturer NIO integrated NVIDIA Triton Inference Server into their AD inference pipeline. NVIDIA Triton Inference Server is an open source multiframework inference serving software.

This post explains how NIO orchestrated its pipeline of image preprocessing and postprocessing and AI models with NVIDIA Triton on GPUs. It also shows how NIO reduced network transmission to successfully speed up their AI inference workflow for AD use cases.

Join the NVIDIA Triton and NVIDIA TensorRT community and stay current on the latest product updates, bug fixes, content, best practices, and more.

Faster AI inference for real-time response

NIO designs, develops, jointly manufactures, and sells premium smart electric vehicles, driving innovations in next-generation technologies in autonomous driving, digital technologies, electric powertrains, and batteries. NIO Autonomous Driving Development Platform (NADP) is an R&D platform dedicated to the core autonomous driving service of NIO.

NIO chose NVIDIA Triton Inference Server because of several key technical and operational reasons, including:

  • NVIDIA Triton supports DAG-based orchestration of numerous models, along with preprocessing or postprocessing modules
  • Cloud-native deployment of NVIDIA Triton enabled multi-GPU, multi-node scaling in a lightweight way
  • High-quality documentation and learning resources helped ease migration to NVIDIA Triton
  • NVIDIA Triton’s stability and robust functionality are necessary for AD use cases 

NIO’s AI inference workflow for autonomous driving

Hundreds of AI models are used to mine data from autonomous vehicles. In a use case like autonomous driving, the inference workflow consists of multiple AI models with preprocessing and postprocessing logic stitched together in a pipeline. 

NIO moved the preprocessing and postprocessing of the pipeline from the client side, which runs on CPUs, to NVIDIA Triton running on GPUs. The NVIDIA Triton’s business logic scripting (BLS) functionality was used to orchestrate the pipeline to run optimally for AD use. 

By moving the preprocessing from CPU to GPU and leveraging efficient pipeline orchestration, NIO achieved 6x latency reduction in some core pipelines, improving the overall throughput by up to 5x. 

Before and after workflow pipelines are shown in Figure 1.

Diagram showing the improved AI inference workflow NIO implemented to achieve latency reduction and increase throughput for an autonomous driving use case.
Figure 1. Comparison of NIO’s AI inference workflows before the introduction of NVIDIA Triton Inference Server (left) and after (right)

Model pipeline orchestration benefits of NVIDIA Triton

This section examines each of the benefits NIO realized by integrating NVIDIA Triton.

GPU-accelerated preprocessing

Preprocessing tasks such as decoding, resizing, and transposing were accelerated on the GPU by NVIDIA Triton using nvJPEG and NVIDIA DALI. This significantly offloaded the computing workload from the client CPU and reduced preprocessing latency.

Upgrading models without the need for client application modification

By moving the preprocessing and postprocessing of the model to NVIDIA Triton, each time the model is upgraded, the client side does not require any modification. This essentially speeds up the rollout of the model, helping it reach production faster.

Using a single GPU node to reduce network data transfer overhead

A unified preprocessing enables multiple copies of the input to be shared with multiple backend recognition models. The process uses GPU shared memory on the server side, without data transfer overhead costs. 

Figure 2 shows the pipeline can connect up to nine models using the NVIDIA Triton business logic scripting functionality.

Workflow diagram that shows how the model pipeline inside the GPU are orchestrated with NVIDIA Triton.
Figure 2. Model pipeline orchestration with the NVIDIA Triton business logic scripting

For an input image of 2 K resolution, the size of each frame is 1920 x 1080 x 3 x 8 = 47 Mb. Assuming a full frame rate of 60 fps, the amount of data input per second is 1920 x 1080 x 3 x 8 x 60 = 2847 Mb. In the previous workflow, each image is sent sequentially to the nine models over the network. Data transferred per second is 1920 x 1080 x 3 x 8 x 60 x 9 = 25 Gb = 3 GB.

In the new workflow, the nine models are orchestrated with the NVIDIA Triton business logic scripting. That means the models can access the image in the GPU shared memory and the images do not have to be sent over the network. Assuming a PCIe bandwidth of 160 Gb = 20 GB per second, theoretically the data generated per second can save 150 ms in data transfer if the data is transferred over PCIe.

Assuming an available bandwidth of 16 Gb = 2 GB per second, theoretically the data generated per second can save 1,500 ms in data transfer if the data is transferred over the network. All these result in speeding up the workflow.

Network transfer savings using image compression 

For accurate model prediction, the input image must be 1920 x 1080 x 3 x 8 bytes in the previous workflow and must be transmitted through the network. After introducing the server-side preprocessing, the original image can be altered to a compressed three-channel 720 pixel image (1280 x 720 x 3) within the allowed range of accuracy loss. 

As a result, it only takes a few hundred KB to transmit the bytes of the compressed image and resize with minimal accuracy loss to 1920 x 1080 x 3 x 8 bytes on the server. This leads to additional network transfer savings, speeding up the workflow. 

Ease of integration in NADP inference platform

NIO’s current inference platform based on NVIDIA Triton is a key component of their Autonomous Driving Development Platform (NADP), used in their autonomous driving solution.

As the NIO platform is built on Kubernetes (K8s), it was imperative for NVIDIA Triton to integrate well with Kubernetes. The components of the workflow are implemented as K8s CRD (native and custom) around NVIDIA Triton.

Flow chart showing how the various components of the ML workflow are connected with each other in a Kubernetes deployment.
Figure 3. NIO’s machine learning workflow in Kubernetes

Continuous Integration/Continuous Delivery (CI/CD)

Argo is the engine used to orchestrate the workflow in Kubernetes. It helps with CI/CD for all the components involved in development, quantification, access, cloud deployment, pressure testing, and launch. NVIDIA Triton helps with CI/CD by triggering the next step in the workflow whenever the models are loaded.

In addition, use of the NVIDIA Triton Docker container helps with consistent functionality across development, test, and deployment environments.

Integrating the Jupyter environment into the NVIDIA Triton image was seamless. Jupyter provides a convenient development environment for debugging in case of a complex problem that requires online debugging or offline reproduction.

Ease of deployment with Istio

NVIDIA Triton natively supports gRPC protocol for communication with applications. However, as the Kubernetes native service cannot offer effective request-level load balancing for gRPC, NVIDIA Triton is integrated with the Istio service mesh. Istio is used to load balance traffic to NVIDIA Triton Inference Server and monitor the health of the service through liveness/readiness probes of NVIDIA Triton.

Ease of use with Apollo configuration management

Apollo Configuration Center is used for model name-based service discovery. Users can access the models without knowing the specific domain name where the model is deployed. Combined with the NVIDIA Triton model repository, users can directly trigger the deployment of models.

Metrics with Prometheus and Grafana

NVIDIA Triton provides a complete set of model service metrics based on model dimensions. For example, NVIDIA Triton can distinguish between inference request queueing time and GPU computation time, enabling fine-grained diagnosis and analysis of online model service performance without entering the debug mode. 

Because NVIDIA Triton supports cloud-native mainstream Prometheus/Grafana, users can easily configure the dashboard and the alarms for each dimension to provide metrics support for high service availability.

Key takeaways

NIO’s optimized workflow that integrates NVIDIA Triton Inference Server resulted in a 6x latency reduction in some core pipelines. This improved overall throughput by up to 5x.

By moving the preprocessing logic to GPU using the NVIDIA Triton pipeline orchestration functionality, NIO achieved:

  • Faster image processing
  • Freed CPU capacity
  • Reduced network transfer overhead
  • Higher inference throughput

NIO achieved AI inference workflow acceleration using NVIDIA Triton Inference Server. NVIDIA Triton was also easy to integrate in a robust Kubernetes-based scalable solution.

Additional resources


Making a Traversable Wormhole with a Quantum Computer

Wormholes — wrinkles in the fabric of spacetime that connect two disparate locations — may seem like the stuff of science fiction. But whether or not they exist in reality, studying these hypothetical objects could be the key to making concrete the tantalizing link between information and matter that has bedeviled physicists for decades.

Surprisingly, a quantum computer is an ideal platform to investigate this connection. The trick is to use a correspondence called AdS/CFT, which establishes an equivalence between a theory that describes gravity and spacetime (and wormholes) in a fictional world with a special geometry (AdS) to a quantum theory that does not contain gravity at all (CFT).

In “Traversable wormhole dynamics on a quantum processor”, published in Nature today, we report on a collaboration with researchers at Caltech, Harvard, MIT, and Fermilab to simulate the CFT on the Google Sycamore processor. By studying this quantum theory on the processor, we are able to leverage the AdS/CFT correspondence to probe the dynamics of a quantum system equivalent to a wormhole in a model of gravity. The Google Sycamore processor is among the first to have the fidelity needed to carry out this experiment.

Background: It from Qubit

The AdS/CFT correspondence was discovered at the end of a series of inquiries arising from the question: What’s the maximum amount of information that can fit in a single region of space? If one asked an engineer how much information could possibly be stored in a datacenter the answer would likely be that it depends on the number and type of memory chips inside it. But surprisingly, what is inside the data center is ultimately irrelevant. If one were to cram more and more memory chips with denser and denser electronics into the datacenter then it will eventually collapse into a black hole and disappear behind an event horizon.

When physicists such as Jacob Bekenstein and Stephen Hawking tried to compute the information content of a black hole, they found to their surprise that it is given by the area of the event horizon — not by the volume of the black hole. It looks as if the information inside the black hole was written on the event horizon. Specifically, a black hole with an event horizon that can be tiled with A tiny units of area (each unit, called a “Planck area,” is 2.6121×10−70 m2) has at most A/4 bits of information. This limit is known as the Bekenstein-Hawking bound.

This discovery that the maximum amount of information that could fit in a region was proportional not to its volume, but to the surface area of the region’s boundary hinted at an intriguing relationship between quantum information and the three-dimensional spatial world of our everyday experience. This relationship has been epitomized by the phrase “It from qubit,” describing how matter (“it”) emerges from quantum information (“qubit”).

While formalizing such a relationship is difficult for ordinary spacetime, recent research has led to remarkable progress with a hypothetical universe with hyperbolic geometry known as “anti-de Sitter space” in which the theory of quantum gravity is more naturally constructed. In anti-de Sitter space, the description of a volume of space with gravity acting in it can be thought of as encoded on the boundary enclosing the volume: every object inside the space has a corresponding description on the boundary and vice versa. This correspondence of information is called the holographic principle, which is a general principle inspired by Bekenstein and Hawking’s observations.

Schematic representation of anti-de Sitter space (interior of cylinder) and its dual representation as quantum information on the boundary (surface of cylinder).

The AdS/CFT correspondence allows physicists to connect objects in space with specific ensembles of interacting qubits on the surface. That is, each region of the boundary encodes (in quantum information) the content of a region in spacetime such that matter at any given location can be “constructed” from the quantum information. This allows quantum processors to work directly with qubits while providing insights into spacetime physics. By carefully defining the parameters of the quantum computer to emulate a given model, we can look at black holes, or even go further and look at two black holes connected to each other — a configuration known as a wormhole, or an Einstein-Rosen bridge.

Experiment: Quantum Gravity in the Lab

Implementing these ideas on a Sycamore processor, we have constructed a quantum system that is dual to a traversable wormhole. Translated from the language of quantum information to spacetime physics via the holographic principle, the experiment let a particle fall into one side of a wormhole and observed it emerging on the other side.

Traversable wormholes were recently shown to be possible by Daniel Jafferis, Ping Gao and Aron Wall. While wormholes have long been a staple of science fiction, there are many possible spacetime geometries in which the formation of a wormhole is possible, but a naïvely constructed one would collapse on a particle traveling through it. The authors showed that a shockwave — i.e., a deformation of spacetime that propagates at the speed of light — of negative energy would solve this problem, propping open the wormhole long enough to enable traversability. The presence of negative energy in a traversable wormhole is similar to negative energy in the Casimir effect, where vacuum energy pushes together closely spaced plates. In both cases, quantum mechanics permits the energy density at a given location in space to be either positive or negative. On the other hand, if the wormhole experienced a shockwave of positive energy, no information would be allowed to pass through.

The simplest application of the holographic principle to create a wormhole requires many, many qubits — in fact, to approach the pencil-and-paper solutions given by theoretical physicists, one would need an arbitrarily large number of qubits. As the number of qubits is reduced, additional corrections are required that are still poorly understood today. New ideas were needed to build a traversable wormhole on a quantum computer with a limited number of qubits.

One of us (Zlokapa) adopted ideas from deep learning to design a small quantum system that preserved key aspects of gravitational physics. Neural networks are trained via backpropagation, a method that optimizes parameters by directly computing the gradient through the layers of the network. To improve the performance of a neural network and prevent it from overfitting to the training dataset, machine learning (ML) practitioners employ a host of techniques. One of these, sparsification, attempts to restrict the detail of information in the network by setting as many weights as possible to zero.

Similarly, to create the wormhole, we started with a large quantum system and treated it like a neural network. Backpropagation updated the parameters of the system in order to maintain gravitational properties while sparsification reduced the size of the system. We applied ML to learn a system that preserved only one key gravitational signature: the importance of using a negative energy shockwave. The training dataset compared dynamics of a particle traversing a wormhole propped open with negative energy and collapsed with positive energy. By ensuring the learned system preserved this asymmetry, we obtained a sparse model consistent with wormhole dynamics.

Learning procedure to produce a sparse quantum system that captures gravitational dynamics. A single coupling consists of all six possible connections between a given group of four fermions.

Working with Jafferis and a handful of collaborators from Caltech, Fermilab, and Harvard, we subjected the new quantum system to numerous tests to determine if it showed gravitational behavior beyond signatures induced by different energy shockwaves. For example, while quantum mechanical effects can transmit information across a quantum system in a diverse set of ways, information that travels in spacetime — including through a wormhole — must be causally consistent. This and other signatures were verified on classical computers, confirming that the dynamics of the quantum system were consistent with a gravitational interpretation as viewed through the dictionary of the holographic principle.

Implementing the traversable wormhole as an experiment on a quantum processor is an extraordinarily delicate process. The microscopic mechanism of information transfer across qubits is highly chaotic: imagine an ink drop swirling in water. As a particle falls into a wormhole, its information gets smeared over the entire quantum system in the holographic picture. For the negative energy shockwave to work, the scrambling of information must follow a particular pattern known as perfect size winding. After the particle hits the negative energy shockwave, the chaotic patterns effectively proceed in reverse: when the particle emerges from the wormhole, it is as if the ink drop has come back together by exactly undoing its original turbulent spread. If, at any point in time, a small error occurs, the chaotic dynamics will not undo themselves, and the particle will not make it through the wormhole.

Left: Quantum circuit describing a traversable wormhole. A maximally entangled pair of qubits (“EPR pair”) are used as an entanglement probe to send a qubit through the wormhole. The qubit is swapped into the left side of the wormhole at time –t0; the energy shockwave is applied at time 0; and the right side of the wormhole is measured at time t1. Right: Photograph of the Google Sycamore quantum processor.

On the Sycamore quantum processor, we measured how much quantum information passed from one side of the system to the other when applying a negative versus a positive energy shockwave. We observed a slight asymmetry between the two energies, showing the key signature of a traversable wormhole. Due to the protocol’s sensitivity to noise, the Sycamore processor’s low error rates were critical to measuring the signal; with even 1.5x the amount of noise, the signal would have been entirely obscured.

Looking Forward

As quantum devices continue to improve, lower error rates and larger chips will allow deeper probes of gravitational phenomena. Unlike experiments such as LIGO that record data about gravity in the world around us, quantum computers provide a tool to explore theories of quantum gravity. We hope that quantum computers will help develop our understanding of future theories of quantum gravity beyond current models.

Gravity is only one example of the unique ability of quantum computers to probe complex physical theories: quantum processors can provide insight into time crystals, quantum chaos, and chemistry. Our work demonstrating wormhole dynamics represents a step towards discovering fundamental physics using quantum processors at Google Quantum AI.

You can also read more about this result here.


We would like to thank our Quantum Science Communicator Katherine McCormick for her help writing this blog post.


An IT Manager’s Guide to Deploying an Edge AI Solution

Timing is everything, especially when it impacts your customer experiences, bottom line, and production efficiency. Edge AI can help by delivering real-time…

Timing is everything, especially when it impacts your customer experiences, bottom line, and production efficiency. Edge AI can help by delivering real-time intelligence and increased privacy in intermittent, low bandwidth, and low cost environments. 

By 2025, according to Gartner®, 75% of data will be created and processed at the edge, outside the traditional data center or cloud.1 It’s no wonder that thousands of companies are turning to edge AI to drive transformation for their businesses.

As organizations undergo this shift, many IT and business leaders are still in the early stages of planning and executing their edge computing strategies. Because edge AI is a new concept, the process is difficult for many. 

NVIDIA, a leading AI infrastructure company with robust experience helping organizations, customers, and partners successfully deploy edge AI solutions, is no stranger to these new concepts. 

In an effort to help others, the learnings and recommendations from these experiences are presented in An IT Manager’s Guide: How to Successfully Deploy an Edge AI Solution. The whitepaper offers an in-depth look at building and executing a successful edge AI deployment.

This post features recommendations on some key considerations when configuring an edge system. 

Edge system configurations: Design recommendations 

There are many parameters to consider when sizing a system. The optimal PCIe server configuration will depend on the target workload for that server. 

Edge AI models incorporate various workloads into their applications, such as vision AI, natural language processing, recommendations based on industrial sensors, and predictive analytics. 

Table showing general system configuration recommendations for a vision AI workload.
Table 1. General system configuration recommendations for a vision AI workload. Actual recommendations will vary based on workload and use case.

Edge computing sizing considerations

When it comes to designing full hardware and software solutions at the edge, it is important to look at the solution as a whole to understand how the parts work together. Below are some of the individual considerations that IT must evaluate for edge AI deployments.

Number of streams: Each camera feed is a stream requiring a certain amount of memory and compute for processing. Small configurations of 6-7 video processing streams require relatively small systems. Larger deployments may require high performance systems that are typically seen in the data center. 

Application examples: One of the first steps to a successful edge AI deployment is understanding what workload needs to be run to reach your goals. Vision AI applications like image recognition, people or vehicle detection, and segmentation are all common use cases. 

Once an application is determined, it is important to understand the intended scale. For example, are additional AI models needed? Typically, a proof of concept (POC) will consist of a single AI model and use case, but most production deployments ultimately incorporate multiple AI models. The next steps include quantifying the business value of the application, dictating any environmental constraints, and securing stakeholder alignment. 

Memory: Perhaps the most common way to under-resource an edge AI solution is to configure the edge systems with too little memory. Edge AI systems require significantly more memory than other applications to support the parallel execution of the inference engine across the CPU and GPUs. 

The data science team or application vendor who trained the AI will know the memory requirements of the latest model. IT teams should, at a minimum, double that number to accommodate the inevitable expansion of the model as it retrains. This will also provide some headroom for the additional AI models that will need to be deployed alongside the first one.

Another rule of thumb is to provision twice as much system memory as the total GPU memory, and never less than 1.5x the total GPU memory. The memory should be evenly spread across all CPU sockets and memory channels for optimal performance.

Networking: As operations increasingly rely on digital technologies such as edge computing, resilience is key. There are two networks to consider when designing an edge solution: the network between the edge AI location and cloud, and the network between a sensor and the edge AI system. 

Understanding the type of network connectivity of your environment will help in determining the specific networking bandwidth requirements for your use case. For example, for a use case like robotics, where wireless connectivity may not be possible, 5G is the next best choice as it offers minimal congestion and guaranteed service and bandwidth. 

Accelerators: Most edge applications run adequately on single socket x86 or Arm CPUs. But when the edge applications incorporate AI capabilities, they are far more compute intensive. 

To run an inference engine at the edge, the edge hardware needs enough compute power to execute complex neural networks with massively parallel computations. CPUs execute all the independent cells of a neural network sequentially, while discreet accelerators can execute them in parallel. Hence, accelerators are architecturally suited for AI, providing meaningfully better performance. They have become an essential component of modern AI infrastructure.

Among the most effective discrete accelerators for edge AI are Graphics Processing Units (GPUs) and Data Processing Units (DPUs). 

Storage: Naturally, the edge server requires local storage, usually a solid state hard drive, for its operating system, network components, hardware drivers, and application software. Unlike other applications, edge AI solutions typically process a massive amount of unstructured input data such as images, voices, and sensor readings. Depending upon how much of this data needs to be stored, for how long, how securely, and the level of reliability, different storage options are called for.

The first step in determining what storage is necessary for an edge AI solution requires IT teams to think through a data strategy. The data strategy will dictate what and how much data will need to be stored locally or in the cloud, and in turn, guide what storage options are best for that particular solution. Without a proactive strategy, developers often make inconsistent and sub-optimal choices that create problems down the road. 

Security: Security is paramount for edge AI computing devices, as they are deployed in remote locations outside the data center firewalls and the physical protections that limit access to systems. For more details, see Edge Computing: Considerations for Security Architects.

When it comes to an edge AI solution, five areas should be understood and made part of the overall solutions architecture: end-to-end encryption, mutual authentication, physical security, zero trust networking, and real-time monitoring.

Management: A remote management plan is critical for edge environments because systems at the edge are distributed, always on, and often operate in remote settings. See Remotely Operating Systems and Applications at the Edge to learn more.

An edge management solution will have automatic deployment and provisioning capabilities, ongoing management, real-time alerting, auditing, and use modern, cloud-native tools. 

Organizations have the choice of whether to build or buy a management solution. The following are questions to consider: How quickly does a solution need to be set up? Is the appropriate team and expertise available? Does this provide secure management of my edge environment? 

Pillars of a successful edge deployment 

Deploying the infrastructure needed to support a scalable edge AI solution is a big challenge. The process is iterative and time consuming, yet critical to do correctly. Decisions that are made when building an edge AI solution have far-reaching implications that will impact an organization’s business outcomes. 

For more guidance on this topic, download An IT Manager’s Guide: How to Successfully Deploy an Edge AI Solution


1. Gartner, “Building an Edge Computing Strategy,” G00753920, September 2021. GARTNER is a registered trademark and service mark of Gartner, Inc. and/or its affiliates in the U.S. and internationally and is used herein with permission. All rights reserved.


New Workshop: Data Parallelism: How to Train Deep Learning Models on Multiple GPUs

Learn how to decrease model training time by distributing data to multiple GPUs, while retaining the accuracy of training on a single GPU.

Learn how to decrease model training time by distributing data to multiple GPUs, while retaining the accuracy of training on a single GPU.


Better Language Models Without Massive Compute

In recent years, language models (LMs) have become more prominent in natural language processing (NLP) research and are also becoming increasingly impactful in practice. Scaling up LMs has been shown to improve performance across a range of NLP tasks. For instance, scaling up language models can improve perplexity across seven orders of magnitude of model sizes, and new abilities such as multi-step reasoning have been observed to arise as a result of model scale. However, one of the challenges of continued scaling is that training new, larger models requires great amounts of computational resources. Moreover, new models are often trained from scratch and do not leverage the weights from previously existing models.

In this blog post, we explore two complementary methods for improving existing language models by a large margin without using massive computational resources. First, in “Transcending Scaling Laws with 0.1% Extra Compute”, we introduce UL2R, which is a lightweight second stage of pre-training that uses a mixture-of-denoisers objective. UL2R improves performance across a range of tasks and even unlocks emergent performance on tasks that previously had close to random performance. Second, in “Scaling Instruction-Finetuned Language Models”, we explore fine-tuning a language model on a collection of datasets phrased as instructions, a process we call “Flan”. This approach not only boosts performance, but also improves the usability of the language model to user inputs without engineering of prompts. Finally, we show that Flan and UL2R can be combined as complementary techniques in a model called Flan-U-PaLM 540B, which outperforms the unadapted PaLM 540B model by 10% across a suite of challenging evaluation benchmarks.

UL2R Training

Traditionally, most language models are pre-trained on either a causal language modeling objective that enables the model to predict the next word in a sequence (e.g., GPT-3 or PaLM) or a denoising objective, where the model learns to recover the original sentence from a corrupted sequence of words, (e.g., T5). Although there are some tradeoffs in language modeling objectives in that causal LMs are better at long-form generation and LMs trained on a denoising objective are better for fine-tuning, in prior work we demonstrated that a mixture-of-denoisers objective that includes both objectives results in better performance on both scenarios.

However, pre-training a large language model on a different objective from scratch can be computationally prohibitive. Hence, we propose UL2 Repair (UL2R), an additional stage of continued pre-training with the UL2 objective that only requires a relatively small amount of compute. We apply UL2R to PaLM and call the resulting new language model U-PaLM.

In empirical evaluations, we found that scaling curves improve substantially with only a small amount of UL2 training. For instance, we show that by using UL2R on the intermediate checkpoint of PaLM 540B, we reach the performance of the final PaLM 540B checkpoint while using 2x less compute (or a difference of 4.4 million TPUv4 hours). Naturally, applying UL2R to the final PaLM 540B checkpoint also leads to substantial improvements, as described in the paper.

Compute versus model performance of PaLM 540B and U-PaLM 540B on 26 NLP benchmarks (listed in Table 8 in the paper). U-PaLM 540B continues training PaLM for a very small amount of compute but provides a substantial gain in performance.

Another benefit that we observed from using UL2R is that on some tasks, performance is much better than models trained purely on the causal language modeling objective. For instance, there are many BIG-Bench tasks that have been described as “emergent abilities”, i.e., abilities that can only be observed in sufficiently large language models. Although the way that emergent abilities are most commonly found is by scaling up the size of the LM, we found that UL2R can actually elicit emergent abilities without increasing the scale of the LM.

For instance, in the Navigate task from BIG-Bench, which measures the model’s ability to perform state tracking, all models except U-PaLM with less than 1023 training FLOPs achieve approximately random performance. U-PaLM performance is more than 10 points above that. Another example of this is the Snarks task from BIG-Bench, which measures the model’s ability to detect sarcasm. Again, whereas all models less than 1024 training FLOPs achieve approximately random performance, U-PaLM achieves well above even for the 8B and 62B models.

For two abilities from BIG-Bench that demonstrate emergent task performance, U-PaLM achieves emergence at a smaller model size due to its use of the UL2R objective.

Instruction Fine-Tuning

In our second paper, we explore instruction fine-tuning, which involves fine-tuning LMs on a collection of NLP datasets phrased as instructions. In prior work, we applied instruction fine-tuning to a 137B-parameter model on 62 NLP tasks, such as answering a trivia question, classifying the sentiment of a movie, or translating a sentence to Spanish.

In this work we fine-tune a 540B parameter language model on more than 1.8K tasks. Moreover, whereas previous efforts only fine-tuned a LM with few-shot exemplars (e.g., MetaICL) or zero-shot without exemplars (e.g., FLAN, T0), we fine-tune on a combination of both. We also include chain of thought fine-tuning data, which enables the model to perform multi-step reasoning. We call our improved methodology “Flan”, for fine-tuning language models. Notably, even with fine-tuning on 1.8K tasks, Flan only uses a small portion of compute compared to pre-training (e.g., for PaLM 540B, Flan only requires 0.2% of the pre-training compute).

We fine-tune language models on 1.8K tasks phrased as instructions, and evaluate them on unseen tasks, which are not included in fine-tuning. We fine-tune both with and without exemplars (i.e., zero-shot and few-shot) and with and without chain of thought, enabling generalization across a range of evaluation scenarios.

In the paper, we instruction–fine-tune LMs of a range of sizes to investigate the joint effect of scaling both the size of the LM and the number of fine-tuning tasks. For instance, for the PaLM class of LMs, which includes models of 8B, 62B, and 540B parameters. We evaluate our models on four challenging benchmark evaluation suites (MMLU, BBH, TyDiQA, and MGSM), and find that both scaling the number of parameters and number of fine-tuning tasks improves performance on unseen tasks.

Both scaling up to a 540B parameter model and using 1.8K fine-tuning tasks improves the performance on unseen tasks. The y-axis is the normalized average over four evaluation suites (MMLU, BBH, TyDiQA, and MGSM).

In addition to better performance, instruction fine-tuning a LM enables it to respond to user instructions at inference time, without few-shot exemplars or prompt engineering. This makes LMs more user-friendly across a range of inputs. For instance, LMs without instruction fine-tuning can sometimes repeat the input or fail to follow instructions, but instruction fine-tuning mitigates such errors.

Our instruction–fine-tuned language model, Flan-PaLM, responds better to instructions compared to the PaLM model without instruction fine-tuning.

Putting Them Together

Finally, we show that UL2R and Flan can be combined to train the Flan-U-PaLM model. Since Flan uses new data from NLP tasks and enables zero-shot instruction following, we apply Flan as the second method after UL2R. We again evaluate on the four benchmark suites, and find that the Flan-U-PaLM model outperforms PaLM models with just UL2R (U-PaLM) or just Flan (Flan-PaLM). Further, Flan-U-PaLM achieves a new state-of-the-art on the MMLU benchmark with a score of 75.4% when combined with chain of thought and self-consistency.

Combining UL2R and Flan (Flan-U-PaLM) leads to the best performance compared to just using UL2R (U-PaLM) or just Flan (Flan-U-PaLM). Performance is the normalized average over four evaluation suites (MMLU, BBH, TyDiQA, and MGSM).


Average performance on four challenging evaluation suites
PaLM 49.1%
U-PaLM 50.2%
Flan-PaLM 58.4%
Flan-U-PaLM 59.1%

Combining UL2R and Flan (Flan-U-PaLM) leads to the best performance compared to just using UL2R (U-PaLM) or just Flan (Flan-U-PaLM). Performance is the normalized average over four evaluation suites (MMLU, BBH, TyDiQA, and MGSM).


Overall, UL2R and Flan are two complementary methods for improving pre-trained language models. UL2R adapts the LM to a mixture-of-denoisers objective using the same data, whereas Flan leverages training data from over 1.8K NLP tasks to teach the model to follow instructions. As LMs become even larger, techniques such as UL2R and Flan that improve general performance without large amounts of compute may become increasingly attractive.


It was a privilege to collaborate on these two papers with Hyung Won Chung, Vinh Q. Tran, David R. So, Siamak Shakeri, Xavier Garcia, Huaixiu Steven Zheng, Jinfeng Rao, Aakanksha Chowdhery, Denny Zhou, Donald Metzler, Slav Petrov, Neil Houlsby, Quoc V. Le, Mostafa Dehghani, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Yunxuan Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, Albert Webson, Shixiang Shane Gu, Zhuyun Dai, Mirac Suzgun, Xinyun Chen, Sharan Narang, Gaurav Mishra, Adams Yu, Vincent Zhao, Yanping Huang, Andrew Dai, Hongkun Yu, Ed H. Chi, Jeff Dean, Jacob Devlin, and Adam Roberts.


Explainer: What Is a Pod? What Is a Cluster?

Our digital lives run on collections of computers tightly linked on high-speed networks, and the latest one is an AI supercomputer called NVIDIA DGX SuperPOD.

Our digital lives run on collections of computers tightly linked on high-speed networks, and the latest one is an AI supercomputer called NVIDIA DGX SuperPOD.


Google at NeurIPS 2022

This week marks the beginning of the 36th annual Conference on Neural Information Processing Systems (NeurIPS 2022), the biggest machine learning conference of the year, which is being held in New Orleans, LA. NeurIPS 2022 will be held in person with additional options for virtual attendees, and includes invited talks, demonstrations and presentations of some of the latest in machine learning research. This year, NeurIPS is also offering a new track, called Spotlight Papers, which will provide opportunities to highlight papers presented in prestigious journals that would otherwise not have been eligible for submission.

Google is proud to be a Diamond level sponsor of NeurIPS this year and will have a significant presence year with more than 175 accepted papers, additionally contributing to and learning from the broader academic research community through numerous talks, posters, workshops, and tutorials. You can learn more about our work being presented in the list below (Google affiliations highlighted in bold).

Organizing Committee

General Chairs includes: Sanmi Koyejo

Program Chairs include: Alekh Agarwal

Workshop Chairs include: Hanie Sedghi

Tutorial Chairs include: Adji Bousso Dieng, Jessica Schrouff

Affinity Workshop Chair: Adji Bousso Dieng, Jessica Schrouff

Program Committee, Senior Area Chairs include: Corinna Cortes, Claudio Gentile, Mohammad Ghavamzadeh, Amir Globerson, Elad Hazan, Katherine Heller, Satyen Kale, Been Kim, Sanjiv Kumar, Hugo Larochelle, Sergey Levine, Yishay Mansour, Mehryar Mohri, Tara Sainath, Dale Schuurmans, Daniel Tarlow

NeurIPS Foundation Board Secretary: Michael Mozer

NeurIPS Foundation Board Members include: Corinna Cortes, Isabelle Guyon, Sanmi Koyejo, Hugo Larochelle

NeurIPS Foundation Advisory Board include: Peter Bartlett, Zoubin Ghahramani, John C. Platt, Fernando Pereira, Dale Schuurmans

Keynote Speakers

The Data-Centric Era: How ML is Becoming an Experimental Science
Isabelle Guyon

The Forward-Forward Algorithm for Training Deep Neural Networks
Geoffrey Hinton

Outstanding Paper Award

Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding
Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily Denton, Seyed Kamyar Seyed Ghasemipour, Burcu Karagol Ayan, S. Sara Mahdavi, Rapha Gontijo Lopes, Tim Salimans, Jonathan Ho, David J Fleet, Mohammad Norouzi

EXPO Day Workshops

Graph Neural Networks in Tensorflow: A Practical Guide
Workshop Organizers include: Bryan Perozzi, Sami Abu-el-Haija

A Hands-On Introduction to Tensorflow and Jax
Workshop Organizers include: Josh Gordon

Affinity Workshops

LatinX in AI (LXAI)
Platinum Sponsor
Networking & Social Chairs include: Andres Muñoz Medina
Program Committee includes: Johan Obando Ceron

Queer in AI
Panelists include: Sara Beery, Talia Ringer

Women in Machine Learning (WiML)
Platinum Sponsor
Workshop Organizers and Mentorship Chairs include: Beliz Gunel
Mentors include: Adam Roberts, Eleni Triantafillou, Zelda Mariet, Clara Hu, Rosanne Liu, Alekh Agarwal, Vinod Prabhakaran, Rose Yu, Katherine Heller


New in ML
Workshop Organizers include: Isabelle Guyon

AI for Accelerated Materials Design (AI4Mat)
Workshop Organizers include: Benjamin Sanchez-Lengeling

All Things Attention: Bridging Different Perspectives on Attention
Invited Speakers and Panelists include: Vidhya Navalpakkam

Efficient Natural Language and Speech Processing (ENLSP-II): The Future of Pre-trained Models
Invited Speakers include: Tara Sainath, Anna Huang
Invited Panelists include: Mohammad Norouzi
Program Committee includes: Wenhu Chen

Federated Learning: Recent Advances and New Challenges
Program Committee includes: Kallista Bonawitz, Zachary Charles, Wenshuo Guo, Peter Kairouz, Zhaozhuo Xu, Zheng Xu

Gaussian Processes, Spatiotemporal Modeling, and Decision-Making Systems
Workshop Organizers include: Zi Wang
Invited Speakers include: Jasper Snoek, Carolina Osorio
Advisory Board includes: Zoubin Ghahramani

Has it Trained Yet? A Workshop for Algorithmic Efficiency in Practical Neural Network Training
Workshop Organizers include: Zachary Nado, George Dahl, Naman Agarwal, Aakanksha Chowdhery
Invited Speakers include: Aakanksha Chowdhery, Priya Goyal

Human in the Loop Learning (HiLL)
Workshop Organizers include: Fisher Yu, Vittorio Ferrari
Invited Speakers include: Dorsa Singh, Igor Mordatch, Ding Zhao

INTERPOLATE — First Workshop on Interpolation Regularizers and Beyond
Workshop Organizers include: Yann Dauphin
Invited Speakers include: Chelsea Finn
Panelists include: Chelsea Finn, Dustin Tran
Program Committee includes: Wang Chen, Kimin Lee

LaReL: Language and Reinforcement Learning
Invited Speakers include: Dorsa Singh, Igor Mordatch

Medical Imaging Meets NeurIPS
Program Committee includes: Chenyu You

Memory in Artificial and Real Intelligence (MemARI)
Program Committee includes: Benjamin Eysenbach, Otilia Stretcu

Workshop Organizers include: Eleni Triantafillou
Invited Speakers include: Lucas Byer, Chelsea Finn
Program Committee includes: Ishita Dasgupta, Praneet Dutta, Benjamin Eysenbach, Maximilian Igl, Louis Kirsch, Parsa Mahmoudieh, Marc Pickett, Eleni Triantafillou

New Frontiers in Graph Learning (GLFrontiers)
Workshop Organizers include: Hanjun Dai

Offline Reinforcement Learning Workshop: Offline RL as a “Launchpad”
Workshop Organizers include: Rishabh Agarwal, Aviral Kumar, George Tucker
Invited Speakers include: Dorsa Sadigh

Score-Based Methods
Invited Speakers include: Mohammad Norouzi
Invited Panelists include: Jascha Sohl-Dickstein

Synthetic Data for Empowering ML Research
Invited Speakers include: Mehryar Mohri
Invited Panelists include: Katrina Ligett
Program Committee includes: Jinsung Yoon

Table Representation Learning
Workshop Organizers include: Pengcheng Yin
Invited Speakers include: Xinyun Chen, Carsten Binnig
Panelists include: Julian Eisenschlos
Program Committee includes: Wenhu Chen, Xinyun Chen, Beliz Gunel

A Causal View on Dynamical Systems
Program Committee includes: Rose Yu

Algorithmic Fairness Through the Lens of Causality and Privacy
Workshop Organizers include: Awa Dieng
Invited Speakers include: Nicolas Papernot
Roundtable Leads include: David Madras, Negar Rostamzadeh, Nyalleng Moroosi
Program Committee includes: Matt Kusner

Broadening Research Collaborations in ML
Workshop Organizers include: Rosanne Liu, Pablo Samuel Castro, Sunipa Dev

Decentralization and Trustworthy Machine Learning in Web3: Methodologies, Platforms, and Applications
Invited Speakers include: Peter Kairouz

Distribution Shifts (DistShift): Connecting Methods and Applications
Workshop Organizers include: Becca Roelofs, Chelsea Finn, Jacob Eisenstein, Pang Wei Koh
Invited Speakers include: Sarah Beery

Foundation Models for Decision Making
Workshop Organizers include: Sherry Yang, Yilun Du, Igor Mordatch, Shixiang Shane Gu,Ofir Nachum
Invited Speakers include: Dorsa Sadigh, Dale Schuurmans, Machel Reid
Program Committee includes: Bo Dai, Aleksandra Faust, Hiroki Furuta, Kati Goshvadi, Izzeddin Gur, Austin Huang, Kimin Lee, Kuang-Huei Lee, Lisa Lee, Yingjie Miao, Jordi Orbay, Ted Xiao

Gaze Meets ML
Program Committee includes: Peter Mattson, Mehdi Moradi

I Can’t Believe It’s Not Better: Understanding Deep Learning Through Empirical Falsification
Workshop Organizers include: Javier Antorán
Panelists include: Kevin Murphy

Interactive Learning for Natural Language Processing
Invited Speakers include: Anca Dragan
Program Committees include: Julia Kreutzer, Shunyu Yao

Machine Learning and the Physical Sciences
Workshop Organizers include: Adji Bousso Dieng
Invited Speakers include: Ekin Doğuş Çubuk

Machine Learning for Systems
Workshop Organizers include: Martin Maas, Azade Nova, Dan Zhang
Invited Speakers include: Jeff Dean
Program Committee includes: Milad Hashemi, Kevin Swersky

Machine Learning in Structural Biology
Invited Speakers include: David Fleet

MATH-AI: Toward Human-Level Mathematical Reasoning
Workshop Organizers include: Swaroop Mishra, Yuhuai Wu
Invited Speakers include: Talia Ringer

OPT 2022: Optimization for Machine Learning
Workshop Organizers include: Courtney Paquette

Reinforcement Learning for Real Life (RL4RealLife)
Workshop Organizers include: Minmin Chen
Invited Panelists include: Pablo Samuel Castro
Program Committee includes: Victor Carbune, Bo Chang, Yinlam Chow, Konstantina Christakopoulou, Bo Dai, Hanjun Dai, Aleksandra Faust, Joshua Greaves‎, Chih-wei Hsu, Rahul Kidambi, Srivatsan Krishnan, Iou-Jen Liu, Cong Lu, Jincheng Mei, Chao Qin

Self-Supervised Learning – Theory and Practice
Invited Speakers include: Mathilde Caron

Symmetry and Geometry in Neural Representations (NeurReps)
Invited Speakers include: Noah Shutty
Program Committee includes: Ondrej Biza, Noah Shutty

Temporal Graph Learning Workshop
Invited Speakers include: Mehran Kazemi

Transfer Learning for Natural Language Processing
Workshop Organizers include: Deepak Ramachandran, Sebastian Ruder
Invited Speakers include: Jonas Pfeiffer
Invited Debaters include: Ellie Pavlick
Program Committee includes: Patrick Fernandes, Jonas Pfeiffer, Jiao Sun, Tu Vu, Xinyi Wang, Xin Xu

Cultures of AI and AI for Culture
Workshop Organizers include: Rida Qadri, Fernando Diaz

Deep Reinforcement Learning Workshop
Workshop Organizers include: Karol Hausman, Ted Xiao, Zeyu Zheng
Invited Speakers include: Igor Mordatch
Advisory Board includes: Chelsea Finn

Empowering Communities: A Participatory Approach to AI for Mental Health
Program Committee includes: Diana Mincu, Subhrajit Roy, Martin Seneviratne

HCAI@NeurIPS 2022, Human Centered AI
Keynote Speaker includes: Fernanda Viegas

Learning Meaningful Representations of Life
Workshop Organizers include: Adji Bousso Dieng

Machine Learning for Creativity and Design
Workshop Organizers include: Yingtao Tian

Machine Learning Safety
Workshop Organizers include: Nicholas Carlini
Invited Speakers include: Dorsa Sadigh

Neuro Causal and Symbolic AI (nCSI)
Workshop Organizers include: Thomas Kipf

Robot Learning Workshop: Trustworthy Robotics
Workshop Organizers include: Alex Bewley, Jonathan Tompson
Invited Speakers include: Karol Hausman, Brian Ichter, Been Kim, Leila Takayama, Andy Zeng
Program Committee includes: Vincent Vanhoucke

The Symbiosis of Deep Learning and Differential Equations II
Workshop Organizers include: Winnie Xu
Invited Speakers include: Rose Yu

Tackling Climate Change with Machine Learning
Workshop Organizers include: Emma Strubell

Trustworthy and Socially Responsible Machine Learning
Invited Speakers include: Been Kim, Dorsa Sadigh, Milind Tambe

Vision Transformers: Theory and Applications
Invited Speakers include: Cordelia Schmid, Ming Hsuan Yang


Advances in Bayesian Optimization
Tutorial Organizers include: Virginia Aglietti

Creative Culture and Machine Learning
Tutorial Organizers include: Negar Rostamzadeh

Fair and Socially Responsible ML for Recommendations: Challenges and Perspectives
Invited Panelists include: Fernando Diaz

Lifelong Learning Machines
Invited Panelists include: Christopher Summerfield

The Role of Meta-learning for Few-Shot Learning
Tutorial Organizers include: Eleni Triantafillou
Invited Panelists include: Neil Houlsby, Priyanka Agrawal


NeurIPS 2022 Competition Track: Overview & Results
Invited Speakers include: Isabelle Guyon

Causal Insights for Learning Paths in Education
Competition Organizers include: Zichao (Jack) Wang

IGLU: Interactive Grounded Language Understanding in a Collaborative Environment
Competition Organizers include: Negar Arabzadeh

Cross-Domain MetaDL: Any-Way Any-Shot Learning Competition with Novel Datasets from Practical Domains
Competition Organizers include: Isabelle Guyon

Reconnaissance Blind Chess: An Unsolved Challenge for Multi-Agent Decision Making Under Uncertainty
Competition Organizers include: Bo Li

VisDA 2022 Challenge: Sim2Real Domain Adaptation for Industrial Recycling
Competition Organizers include: Dina Bashkirova

Spotlight Papers

CoPur: Certifiably Robust Collaborative Inference via Feature Purification
Jing Liu, Chulin Xie, Oluwasanmi O Koyejo, Bo Li

Machine Learning on Graphs: A Model and Comprehensive Taxonomy
Ines Chami*, Sami Abu-El-Haija, Bryan Perozzi, Christopher Ré, Kevin Murphy

Sparse Winning Tickets are Data-Efficient Image Recognizers
Mukund Varma T, Xuxi Chen, Zhenyu Zhang, Tianlong Chen, Subhashini Venugopalan, Zhangyang Wang

Federated Learning from Pre-trained Models: A Contrastive Learning Approach
Yue Tan, Guodong Long, Jie Ma, Lu Liu, Tianyi Zhou, Jing Jiang

Improving Multi-task Generalization via Regularizing Spurious Correlation
Ziniu Hu*, Zhe Zhao, Xinyang Yi, Tiansheng Yao, Lichan Hong, Yizhou Sun, Ed H. Chi

The Nature of Temporal Difference Errors in Multi-step Distributional Reinforcement Learning
Yunhao Tang, Mark Rowland, Rémi Munos, Bernardo Ávila Pires, Will Dabney, Marc G. Bellemare

Residual Multiplicative Filter Networks for Multiscale Reconstruction
Shayan Shekarforoush, David B. Lindell, David J. Fleet, Marcus A Brubaker

Differentially Private Learning with Margin Guarantees
Raef Bassily, Mehryar Mohri, Ananda Theertha Suresh

Optimal Query Complexities for Dynamic Trace Estimation
David P. Woodruff*, Fred Zhang*, Qiuyi Zhang


From Gradient Flow on Population Loss to Learning with Stochastic Gradient Descent
Ayush Sekhari, Satyen Kale, Jason D. Lee, Chris De Sa, Karthik Sridharan

On the Global Convergence Rates of Decentralized Softmax Gradient Play in Markov Potential Games
Runyu Zhang, Jincheng Mei, Bo Dai, Dale Schuurmans, Na Li

Matryoshka Representation Learning
Aditya Kusupati, Gantavya Bhatt, Aniket Rege, Matthew Wallingford, Aditya Sinha, Vivek Ramanujan, William Howard-Snyder, Kaifeng Chen, Sham Kakade, Prateek Jain, Ali Farhadi

Efficient Risk-Averse Reinforcement Learning
Ido Greenberg, Yinlam Chow, Mohammad Ghavamzadeh, Shie Mannor

Operator Splitting Value Iteration
Amin Rakhsha, Andrew Wang, Mohammad Ghavamzadeh, Amir-massoud Farahmand

Cluster Randomized Designs for One-Sided Bipartite Experiments
Jennifer Brennan*, Vahab Mirrokni, Jean Pouget-Abadie

A Unified Sequence Interface for Vision Tasks
Ting Chen, Saurabh Saxena, Lala Li, Tsung-Yi Lin*, David J. Fleet, Geoffrey Hinton

Cryptographic Hardness of Learning Halfspaces with Massart Noise
Ilias Diakonikolas, Daniel M. Kane, Pasin Manurangsi, Lisheng Ren

Better Best of Both Worlds Bounds for Bandits with Switching Costs
Idan Amir, Guy Azov, Tomer Koren, Roi Livni

Fast Neural Kernel Embeddings for General Activations
Insu Han, Amir Zandieh, Jaehoon Lee, Roman Novak, Lechao Xiao, Amin Karbasi

Hierarchical Agglomerative Graph Clustering in Poly-Logarithmic Depth
Laxman Dhulipala, David Eisenstat, Jakub Łącki, Vahab Mirronki, Jessica Shi

Improving Zero-Shot Generalization in Offline Reinforcement Learning Using Generalized Similarity Functions
Bogdan Mazoure*, Ilya Kostrikov, Ofir Nachum, Jonathan Tompson

Indicators of Attack Failure: Debugging and Improving Optimization of Adversarial Examples
Maura Pintor, Luca Demetrio, Angelo Sotgiu, Ambra Demontis, Nicholas Carlini, Battista Biggio, Fabio Roli

Learning Energy Networks with Generalized Fenchel-Young Losses
Mathieu Blondel, Felipe Llinares-López, Robert Dadashi, Léonard Hussenot, Matthieu Geist

Learning Robust Dynamics Through Variational Sparse Gating
Arnav Kumar Jain, Shiva Kanth Sujit, Shruti Joshi, Vincent Michalski, Danijar Hafner, Samira Ebrahimi Kahou

Learning to Reason with Neural Networks: Generalization, Unseen Data and Boolean Measures
Arnav Kumar Jain, Shiva Kanth Sujit, Shruti Joshi, Vincent Michalski, Danijar Hafner, Samira Ebrahimi Kahou

So3krates: Equivariant Attention for Interactions on Arbitrary Length-Scales in Molecular Systems
J. Thorben Frank, Oliver T. Unke, Klaus-Robert Müller

Spectral Bias in Practice: The Role of Function Frequency in Generalization
Sara Fridovich-Keil*, Raphael Gontijo-Lopes, Rebecca Roelofs

Delving into Out-of-Distribution Detection with Vision-Language Representations
Yifei Ming, Ziyang Cai, Jiuxiang Gu, Yiyou Sun, Wei Li, Yixuan Li

Path Independent Equilibrium Models Can Better Exploit Test-Time Computation
Cem Anil, Ashwini Pokle, Kaiqu Liang, Johannes Treutlein, Yuhuai Wu, Shaojie Bai, J. Zico Kolter, Roger Grosse

On Optimal Learning Under Targeted Data Poisoning
Steve Hanneke, Amin Karbasi, Mohammad Mahmoody, Idan Mehalel, Shay Moran

Learning With Little Mixing
Ingvar Ziemann, Stephen Tu

Block-Recurrent Transformers
DeLesley Hutchins, Imanol Schlag*, Yuhuai Wu, Ethan Dyer, Behnam Neyshabur

TabNAS: Rejection Sampling for Neural Architecture Search on Tabular Datasets
Chengrun Yang, Gabriel Bender, Hanxiao Liu, Pieter-Jan Kindermans, Madeleine Udell, Yifeng Lu, Quoc Le, Da Huang

Regret Bounds for Multilabel Classification in Sparse Label Regimes
Robert Busa-Fekete, Heejin Choi, Krzysztof Dembczynski, Claudio Gentile, Henry William Reeve, Balazs Szorenyi

Robust Reinforcement Learning Using Offline Data
Kishan Panaganti, Zaiyan Xu, Dileep Kalathil, Mohammad Ghavamzadeh

Contrastive Learning as Goal-Conditioned Reinforcement Learning
Benjamin Eysenbach, Tianjun Zhang, Sergey Levine, Ruslan Salakhutdinov

Beyond Rewards: A Hierarchical Perspective on Offline Multiagent Behavioral Analysis
Shayegan Omidshafiei, Andrei Kapishnikov, Yannick Assogba, Lucas Dixon, Been Kim

Revisiting Neural Scaling Laws in Language and Vision
Ibrahim Alabdulmohsin, Behnam Neyshabur, Xiaohua Zhai

Polynomial Neural Fields for Subband Decomposition and Manipulation
Guandao Yang*, Sagie Benaim, Varun Jampani, Kyle Genova, Jonathan T. Barron, Thomas Funkhouser, Bharath Hariharan, Serge Belongie

First Is Better Than Last for Language Data Influence
Chih-Kuan Yeh, Ankur Taly, Mukund Sundararajan, Frederick Liu, Pradeep Ravikumar

The Privacy Onion Effect: Memorization Is Relative
Nicholas Carlini, Matthew Jagielski, Chiyuan Zhang, Nicolas Papernot, Andreas Terzis, Florian Tramer

Deep Hierarchical Planning from Pixels (see blog post)
Danijar Hafner, Kuang-Huei Lee, Ian Fischer, Pieter Abbeel

Discovered Policy Optimisation
Chris Lu, Jakub Grudzien Kuba, Alistair Letcher, Luke Metz, Christian Schroeder de Witt, Jakob Foerster

Semi-supervised Active Linear Regression
Fnu Devvrit, Nived Rajaraman, Pranjal Awasthi

Pruning’s Effect on Generalization Through the Lens of Training and Regularization
Tian Jin, Daniel M. Roy, Michael Carbin, Jonathan Frankle, Gintare Karolina Dziugaite

Exploring Length Generalization in Large Language Models
Cem Anil*, Yuhuai Wu, Anders Andreassen, Aitor Lewkowycz, Vedant Misra, Vinay Ramasesh, Ambrose Slone, Guy Gur-Ari, Ethan Dyer, Behnam Neyshabur

Fast Stochastic Composite Minimization and an Accelerated Frank-Wolfe Algorithm Under Parallelization
Benjamin Dubois-Taine, Francis Bach, Quentin Berthet, Adrien Taylor

Global Normalization for Streaming Speech Recognition in a Modular Framework
Ehsan Variani, Ke Wu, Michael Riley, David Rybach, Matt Shannon, Cyril Allauzen

Learning Predictions for Algorithms with Predictions
Mikhail Khodak, Maria-Florina Balcan, Ameet Talwalkar, Sergei Vassilvitskii

Multimodal Contrastive Learning with LIMoE: the Language-Image Mixture of Experts (see blog post)
Basil Mustafa, Carlos Riquelme, Joan Puigcerver, Rodolphe Jenatton, Neil Houlsby

Incrementality Bidding via Reinforcement Learning Under Mixed and Delayed Rewards
Ashwinkumar Badanidiyuru, Zhe Feng, Tianxi Li, Haifeng Xu*

Solving Quantitative Reasoning Problems with Language Models (see blog post)
Aitor Lewkowycz, Anders Andreassen, David Dohan, Ethan Dyer, Henryk Michalewski, Vinay Ramasesh, Ambrose Slone, Cem Anil, Imanol Schlag, Theo Gutman-Solo, Yuhuai Wu, Behnam Neyshabur, Guy Gur-Ari, Vedant Misra

Anonymized Histograms in Intermediate Privacy Models
Badih Ghazi, Pritish Kamath, Ravi Kumar, Pasin Manurangsi

Efficient and Stable Fully Dynamic Facility Location
Sayan Bhattacharya, Nikos Parotsidis, Silvio Lattanzi

Are All Losses Created Equal: A Neural Collapse Perspective
Jinxin Zhou, Chong You, Xiao Li, Kangning Liu, Sheng Liu, Qing Qu, Zhihui Zhu

Universal Rates for Interactive Learning
Steve Hanneke, Amin Karbasi, Shay Moran, Grigoris Velegkas

Nearly Optimal Algorithms for Linear Contextual Bandits with Adversarial Corruptions
Jiafan He, Dongruo Zhou, Tong Zhang, Quanquan Gu

Multiclass Learnability Beyond the PAC Framework: Universal Rates and Partial Concept Classes
Alkis Kalavasis, Grigoris Velegkas, Amin Karbasi

Temporal Latent Bottleneck: Synthesis of Fast and Slow Processing Mechanisms in Sequence Learning
Cenk Baykal, Nishanth Dikkala, Rina Panigrahy, Cyrus Rashtchian, Xin Wang

Pre-trained Language Models for Interactive Decision-Making
Shuang Li, Xavier Puig, Chris Paxton, Yilun Du, Clinton Wang, Linxi Fan, Tao Chen, De-An Huang, Ekin Akyürek, Anima Anandkumar, Jacob Andreas, Igor Mordatch, Antonio Torralba, Yuke Zhu

Polynomial Neural Fields for Subband Decomposition and Manipulation
Guandao Yang*, Sagie Benaim, Varun Jampani, Kyle Genova, Jonathan T. Barron, Thomas Funkhouser, Bharath Hariharan, Serge Belongie

Submodular Maximization in Clean Linear Time
Wenxin Li, Moran Feldman, Ehsan Kazemi, Amin Karbasi

Reinforcement Learning with Logarithmic Regret and Policy Switches
Grigoris Velegkas, Zhuoran Yang, Amin Karbasi

Algorithms with Prediction Portfolios
Michael Dinitz, Sungjin Im, Thomas Lavastida, Benjamin Moseley, Sergei Vassilvitskii

Understanding and Improving Robustness of Vision Transformers Through Patch-Based Negative Augmentation
Yao Qin, Chiyuan Zhang, Ting Chen, Balaji Lakshminarayanan, Alex Beutel, Xuezhi Wang

Best of Both Worlds Model Selection
Aldo Pacchiano, Christoph Dann, Claudio Gentile

Fair Wrapping for Black-Box Predictions
Alexander Soen, Ibrahim Alabdulmohsin, Sanmi Koyejo, Yishay Mansour, Nyalleng Moorosi, Richard Nock, Ke Sun, Lexing Xie

A Reduction to Binary Approach for Debiasing Multiclass Datasets
Ibrahim Alabdulmohsin, Jessica Schrouff, Oluwasanmi Koyejo

Weighted Distillation with Unlabeled Examples
Fotis Iliopoulos, Vasilis Kontonis, Cenk Baykal, Gaurav Menghani, Khoa Trihn,Erik Vee

A Closer Look at Learned Optimization: Stability, Robustness, and Inductive Biases
James Harrison, Luke Metz, Jascha Sohl-Dickstein

Post-hoc Estimators for Learning to Defer to an Expert
Harikrishna Narasimhan, Wittawat Jitkrittum, Aditya Krishna Menon, Ankit Singh Rawat, Sanjiv Kumar

Model-Based RL with Optimistic Posterior Sampling: Structural Conditions and Sample Complexity
Alekh Agarwal, Tong Zhang

On the Statistical Efficiency of Reward-Free Exploration in Non-Linear RL
Jinglin Chen, Aditya Modi, Akshay Krishnamurthy, Nan Jiang, Alekh Agarwal

Towards Learning Universal Hyperparameter Optimizers with Transformers (see blog post)
Yutian Chen, Xingyou Song, Chansoo Lee, Zi Wang, Qiuyi Zhang, David Dohan, Kazuya Kawakami, Greg Kochanski, Arnaud Doucet, Marc’aurelio Ranzato, Sagi Perel, Nando de Freitas

Reproducibility in Optimization: Theoretical Framework and Limits
Kwangjun Ahn*, Prateek Jain, Ziwei Ji, Satyen Kale, Praneeth Netrapalli, Gil I. Shamir

Confident Adaptive Language Modeling
Tal Schuster, Adam Fisch, Jai Gupta, Mostafa Dehghani, Dara Bahri, Vinh Q. Tran, Yi Tay, Donald Metzler

Reinforcement Learning with Neural Radiance Fields
Danny Driess, Ingmar Schubert, Pete Florence, Yunzhu Li, Marc Toussaint

Invariant and Transportable Representations for Anti-Causal Domain Shifts
Yibo Jiang, Victor Veitch

Simple Mechanisms for Welfare Maximization in Rich Advertising Auctions
Gagan Aggarwal, Kshipra Bhawalkar, Aranyak Mehta, Divyarthi Mohan, Alexandros Psomas

STaR: Bootstrapping Reasoning with Reasoning
Eric Zelikman, Yuhuai Wu, Jesse Mu, Noah D. Goodman

Stochastic Online Learning with Feedback Graphs: Finite-Time and Asymptotic Optimality
Teodor V. Marinov, Mehryar Mohri, Julian Zimmert

The Curse of Unrolling: Rate of Differentiating Through Optimization
Damien Scieur, Quentin Bertrand, Gauthier Gidel, Fabian Pedregosa

Visual Prompting via Image Inpainting
Amir Bar, Yossi Gandelsman, Trevor Darrell, Amir Globerson, Alexei A Efros

Multi-Class H-Consistency Bounds
Pranjal Awasthi, Anqi Mao, Mehryar Mohri, Yutao Zhong

Anonymous Bandits for Multi-User Systems
Hossein Esfandiari, Vahab Mirrokni, Jon Schneider

Understanding the Eluder Dimension
Gene Li, Pritish Kamath, Dylan J. Foster, Nathan Srebro

Why So Pessimistic? Estimating Uncertainties for Offline RL Through Ensembles, and Why Their Independence Matters
Seyed Kamyar Seyed Ghasemipour, Shixiang Shane Gu, Ofir Nachum

A Best-of-Both-Worlds Algorithm for Bandits with Delayed Feedback
Saeed Masoudian, Julian Zimmert, Yevgeny Seldin

A Theoretical View on Sparsely Activated Networks
Cenk Baykal, Nishanth Dikkala, Rina Panigrahy, Cyrus Rashtchian, Xin Wang

Chain of Thought Prompting Elicits Reasoning in Large Language Models (see blog post)
Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le, Denny Zhou

Decoupled Context Processing for Context Augmented Language Modeling
Zonglin Li, Ruiqi Guo, Sanjiv Kumar

Exploring Through Random Curiosity with General Value Functions
Aditya Ramesh, Louis Kirsch, Sjoerd van Steenkiste, Jürgen Schmidhuber

Object Scene Representation Transformer
Mehdi S. M. Sajjadi, Daniel Duckworth, Aravindh Mahendran, Sjoerd van Steenkiste, Filip Pavetić, Mario Lučić, Leonidas J. Guibas, Klaus Greff, Thomas Kipf

Joint Model-Policy Optimization of a Lower Bound for Model-Based RL
Benjamin Eysenbach, Alexander Khazatsky, Sergey Levine, Ruslan Salakhutdinov

A Fourier Approach to Mixture Learning
Mingda Qiao*, Guru Guruganesh, Ankit Singh Rawat, Avinava Dubey, Manzil Zaheer

Why Neural Networks Find Simple Solutions: The Many Regularizers of Geometric Complexity
Benoit Dherin, Michael Munn, Mihaela Rosca, David Barrett

Do Current Multi-task Optimization Methods in Deep Learning Even Help?
Derrick Xin, Behrooz Ghorbani, Ankush Garg, Orhan Firat, Justin Gilmer

Associating Objects and Their Effects in Video Through Coordination Games
Erika Lu, Forrester Cole, Weidi Xie, Tali Dekel, William Freeman, Andrew Zisserman, Michael Rubinstein

Increasing Confidence in Adversarial Robustness Evaluations
Roland S. Zimmermann*, Wieland Brendel, Florian Tramèr, Nicholas Carlini

The Role of Baselines in Policy Gradient Optimization
Jincheng Mei, Wesley Chung, Valentin Thomas, Bo Dai, Csaba Szepesvari, Dale Schuurmans

Scaling Multimodal Pre-training via Cross-Modality Gradient Harmonization
Junru Wu, Yi Liang, Feng Han, Hassan Akbari, Zhangyang Wang, Cong Yu*

S3GC: Scalable Self-Supervised Graph Clustering
Fnu Devvrit*, Aditya Sinha, Inderjit Dhillon, Prateek Jain

Algorithms and Hardness for Learning Linear Thresholds from Label Proportions
Rishi Saket

ALMA: Hierarchical Learning for Composite Multi-Agent Tasks
Shariq Iqbal, Robby Costales, Fei Sha

DC-BENCH: Dataset Condensation Benchmark
Justin Cui, Ruochen Wang, Si Si, Cho-Jui Hsieh

Does GNN Pre-training Help Molecular Representation?
Ruoxi Sun, Hanjun Dai, Adams Yu

Drawing Out of Distribution with Neuro-Symbolic Generative Models
Yichao Liang, Joshua B. Tenenbaum, Tuan Anh Le, N. Siddharth

Mixture-of-Experts with Expert Choice Routing (see blog post)
Yanqi Zhou, Tao Lei, Hanxiao Liu, Nan Du, Yanping Huang, Vincent Zhao, Andrew Dai, Zhifeng Chen, Quoc Le, James Laudon

Near-Optimal Regret for Adversarial MDP with Delayed Bandit Feedback
Tiancheng Jin, Tal Lancewicki, Haipeng Luo, Yishay Mansour, Aviv Rosenberg

Precise Learning Curves and Higher-Order Scalings for Dot-Product Kernel Regression
Lechao Xiao, Jeffrey Pennington, Theodor Misiakiewicz, Hong Hu, Yue Lu

Rate-Optimal Online Convex Optimization in Adaptive Linear Control
Asaf Cassel, Alon Cohen, Tomer Koren

Why Neural Networks Find Simple Solutions: The Many Regularizers of Geometric Complexity
Benoit Dherin, Michael Munn, Mihaela Rosca, David G.T. Barrett

Private Isotonic Regression
Badih Ghazi, Pritish Kamath, Ravi Kumar, Pasin Manurangsi

Sketching Based Representations for Robust Image Classification with Provable Guarantees
Nishanth Dikkala, Sankeerth Rao Karingula, Raghu Meka, Jelani Nelson, Rina Panigrahy, Xin Wang

The Role of Baselines in Policy Gradient Optimization
Jincheng Mei, Wesley Chung, Valentin Thomas, Bo Dai, Csaba Szepesvari, Dale Schuurmans

Bringing Image Scene Structure to Video via Frame-Clip Consistency of Object Tokens
Elad Ben Avraham, Roei Herzig, Karttikeya Mangalam, Amir Bar, Anna Rohrbach, Leonid Karlinsky, Trevor Darrell, Amir Globerson

Near-Optimal Private and Scalable k-Clustering
Vincent Cohen-Addad, Alessandro Epasto, Vahab Mirrokni, Shyam Narayanan*, Peilin Zhong

When Does Differentially Private Learning Not Suffer in High Dimensions?
Xuechen Li, Daogao Liu, Tatsunori Hashimoto, Huseyin A Inan, Janardhan Kulkarni, YinTat Lee, Abhradeep Guha Thakurta

End-to-End Learning to Index and Search in Large Output Spaces
Nilesh Gupta, Patrick H. Chen, Hsiang-Fu, Yu, Cho-Jui Hsieh, Inderjit S. Dhillon

A Boosting Approach to Reinforcement Learning
Nataly Brukhim, Elad Hazan, Karan Singh

FedRolex: Model-Heterogeneous Federated Learning with Rolling Sub-Model Extraction
Samiul Alam, Luyang Liu, Ming Yan, Mi Zhang

Non-Convex Online Learning via Algorithmic Equivalence
Udaya Ghai, Zhou Lu, Elad Hazan

Is this the Right Neighborhood? Accurate and Query Efficient Model Agnostic Explanations
Amit Dhurandhar, Karthikeyan Natesan Ramamurthy, Karthikeyan Shanmugam

SAVi++: Towards End-to-End Object-Centric Learning from Real-World Videos
Gamaleldin F. Elsayed, Aravindh Mahendran, Sjoerd van Steenkiste, Klaus Greff, Michael C. Mozer, Thomas Kipf

UViM: A Unified Modeling Approach for Vision with Learned Guiding Codes
Alexander Kolesnikov, André Susano Pinto, Lucas Beyer, Xiaohua Zhai, Jeremiah Harmsen, Neil Houlsby

Implicit Regularization or Implicit Conditioning? Exact Risk Trajectories of SGD in High Dimensions
Courtney Paquette, Elliot Paquette, Ben Adlam, Jeffrey Pennington

Multi-game Decision Transformers (see blog post)
Kuang-Huei Lee, Ofir Nachum, Mengjiao Yang, Lisa Lee, Daniel Freeman, Winnie Xu, Sergio Guadarrama, Ian Fischer, Eric Jang, Henryk Michalewski, Igor Mordatch

Subsidiary Prototype Alignment for Universal Domain Adaptation
Jogendra Nath Kundu, Suvaansh Bhambri, Akshay Ravindra Kulkarni, Hiran Sarkar, Varun Jampani, Venkatesh Babu Radhakrishnan

SAMURAI: Shape And Material from Unconstrained Real-world Arbitrary Image collections
Mark Boss*, Andreas Engelhardt*, Abhishek Kar, Yuanzhen Li, Deqing Sun, Jonathan T. Barron, Hendrik P. A. Lensch, Varun Jampani

Chefs’ Random Tables: Non-Trigonometric Random Features
Valerii Likhosherstov, Krzysztof Marcin Choromanski, Avinava Dubey, Frederick Liu, Tamas Sarlos, Adrian Weller

Lottery Tickets on a Data Diet: Finding Initializations with Sparse Trainable Networks
Mansheej Paul, Brett W Larsen, Surya Ganguli, Jonathan Frankle, Gintare Karolina Dziugaite

DP-PCA: Statistically Optimal and Differentially Private PCA
Xiyang Liu, Weihao Kong, Prateek Jain, Sewoong Oh

Emergent Communication: Generalization and Overfitting in Lewis Games
Mathieu Rita, Corentin Tallec, Paul Michel, Jean-Bastien Grill, Olivier Pietquin, Emmanuel Dupoux, Florian Strub

Handcrafted Backdoors in Deep Neural Networks
Sanghyun Hong, Nicholas Carlini, Alexey Kurakin

I2DFormer: Learning Image to Document Attention for Zero-Shot Image Classification
Muhammad Ferjad Naeem, Yongqin Xian, Luc Van Gool, Federico Tombari

Improved Differential Privacy for SGD via Optimal Private Linear Operators on Adaptive Streams
Sergey Denisov, Brendan McMahan, Keith Rush, Adam Smith, Abhradeep Guha Thakurta

Optimal Scaling for Locally Balanced Proposals in Discrete Spaces
Haoran Sun*, Hanjun Dai, Dale Schuurmans

Near-Optimal Correlation Clustering with Privacy
Vincent Cohen-Addad, Chenglin Fan, Silvio Lattanzi, Slobodan Mitrović, Ashkan Norouzi-Fard, Nikos Parotsidis, Jakub Tarnawski

Thor: Wielding Hammers to Integrate Language Models and Automated Theorem Provers
Albert Q. Jiang, Wenda Li, Szymon Tworkowski, Konrad Czechowski, Tomasz Odrzygóźdź, Piotr Miłoś, Yuhuai Wu, Mateja Jamnik

TPU-KNN: K Nearest Neighbor Search at Peak FLOP/s
Felix Chern, Blake Hechtman, Andy Davis, Ruiqi Guo, David Majnemer, Sanjiv Kumar

When Does Dough Become a Bagel? Analyzing the Remaining Mistakes on ImageNet
Vijay Vasudevan, Benjamin Caine, Raphael Gontijo-Lopes, Sara Fridovich-Keil, Rebecca Roelofs

DASCO: Dual-Generator Adversarial Support Constrained Offline Reinforcement Learning
Quan Vuong, Aviral Kumar, Sergey Levine, Yevgen Chebotar

A Characterization of Semi-Supervised Adversarially Robust PAC Learnability
Idan Attias, Steve Hanneke, Yishay Mansour

Back Razor: Memory-Efficient Transfer Learning by Self-Sparsified Backpropagation
Ziyu Jiang, Xuxi Chen, Xueqin Huang, Xianzhi Du, Denny Zhou, Zhangyang Wang

Subquadratic Kronecker Regression with Applications to Tensor Decomposition
Matthew Fahrbach, Gang Fu, Mehrdad Ghadiri

Zero-Shot Transfer Learning Within a Heterogeneous Graph via Knowledge Transfer Networks
Minji Yoon*, John Palowitch, Dustin Zelle, Ziniu Hu*, Ruslan Salakhutdinov, Bryan Perozzi

Differentially Private Graph Learning via Sensitivity-Bounded Personalized PageRank
Alessandro Epasto, Vahab Mirrokni, Bryan Perozzi, Anton Tsitsulin, Peilin Zhong

Reincarnating Reinforcement Learning: Reusing Prior Computation to Accelerate Progress (see blog post)
Rishabh Agarwal, Max Schwarzer, Pablo Samuel Castro, Aaron Courville, Marc G. Bellemare

Private and Communication-Efficient Algorithms for Entropy Estimation
Gecia Bravo-Hermsdorff, Robert Busa-Fekete, Mohammad Ghavamzadeh, Andres Munoz Medina, Umar Syed

Oracle Inequalities for Model Selection in Offline Reinforcement Learning
Jonathan Lee, George Tucker, Ofir Nachum, Bo Dai, Emma Brunskill

Diagnosing Failures of Fairness Transfer Across Distribution Shift in Real-World Medical Settings
Jessica Schrouff*, Natalie Harris, Oluwasanmi O Koyejo, Ibrahim Alabdulmohsin, Eva Schnider*, Krista Opsahl-Ong, Alexander Brown, Subhrajit Roy, Diana Mincu, Christina Chen, Awa Dieng, Yuan Liu, Vivek Natarajan, Alan Karthikesalingam, Katherine A Heller, Silvia Chiappa, Alexander D’Amour

LASSIE: Learning Articulated Shapes from Sparse Image Ensemble via 3D Part Discovery
Chun-Han Yao, Wei-Chih Hung, Yuanzhen Li, Michael Rubinstein, Ming-Hsuan Yang, Varun Jampani

Patching Open-Vocabulary Models by Interpolating Weights
Gabriel Ilharco, Mitchell Wortsman, Samir Yitzhak Gadre, Shuran Song, Hannaneh Hajishirzi, Simon Kornblith, Ali Farhadi, Ludwig Schmidt

TUSK: Task-Agnostic Unsupervised Keypoints
Yuhe Jin, Weiwei Sun, Jan Hosang, Eduard Trulls, Kwang Moo Yi

Active Learning of Classifiers with Label and Seed Queries
Marco Bressan, Nicolò Cesa-Bianchi, Silvio Lattanzi, Andrea Paudice, Maximilian Thiessen

Autoformalization with Large Language Models
Yuhuai Wu, Albert Q. Jiang, Wenda Li, Markus N. Rabe, Charles Staats, Mateja Jamnik, Christian Szegedy

Benign Underfitting of Stochastic Gradient Descent
Tomer Koren, Roi Livni, Yishay Mansour, Uri Sherman

Chain of Thought Imitation with Procedure Cloning
Mengjiao Yang, Dale Schuurmans, Pieter Abbeel, Ofir Nachum

Efficient and Modular Implicit Differentiation
Mathieu Blondel, Quentin Berthet, Marco Cuturi, Roy Frostig, Stephan Hoyer, Felipe Llinares-López, Fabian Pedregosa, Jean-Philippe Vert

Insights into Pre-training via Simpler Synthetic Tasks
Yuhuai Wu, Felix Li, Percy Liang

Self-Supervised Learning with an Information Maximization Criterion
Serdar Ozsoy, Shadi Hamdan, Sercan Ö. Arik, Deniz Yuret, Alper T. Erdogan

Trimmed Maximum Likelihood Estimation for Robust Generalized Linear Model
Weihao Kong, Rajat Sen, Pranjal Awasthi, Abhimanyu Das

Using Embeddings for Causal Estimation of Peer Influence in Social Networks
Irina Cristali, Victor Veitch

VCT: A Video Compression Transformer
Fabian Mentzer, George Toderici, David Minnen, Sung-Jin Hwang, Sergi Caelles, Mario Lucic, Eirikur Agustsson

Video Diffusion Models
Jonathan Ho, Tim Salimans, Alexey Gritsenko, William Chan, Mohammad Norouzi, David J. Fleet

Large Language Models are Zero-Shot Reasoners
Takeshi Kojima, Shixiang Shane Gu, Machel Reid, Yutaka Matsuo, Yusuke Iwasawa

Improved Coresets for Euclidean k-Means
Vincent Cohen-Addad, Kasper Green Larsen, David Saulpic, Chris Schwiegelshohn, Omar Ali Sheikh-Omar

On the Adversarial Robustness of Mixture of Experts
Joan Puigcerver, Rodolphe Jenatton, Carlos Riquelme Ruiz, Pranjal Awasthi, Srinadh Bhojanapalli

Stars: Tera-Scale Graph Building for Clustering and Learning
CJ Carey, Jonathan Halcrow, Rajesh Jayaram, Vahab Mirrokni, Warren Schudy, Peilin Zhong

VER: Scaling On-Policy RL Leads to the Emergence of Navigation in Embodied Rearrangement
Erik Wijmans, Irfan Essa, Dhruv Batra

TaSIL: Taylor Series Imitation Learning
Daniel Pfrommer, Thomas TCK Zhang, Stephen Tu, Nikolai Matni

RNNs of RNNs: Recursive Construction of Stable Assemblies of Recurrent Neural Networks
Leo Kozachkov, Michaela M Ennis, Jean-Jacques Slotine

Integral Probability Metrics PAC-Bayes Bounds
Ron Amit, Baruch Epstein, Shay Moran, Ron Meir

D2NeRF: Self-Supervised Decoupling of Dynamic and Static Objects from a Monocular Video
Tianhao Wu, Fangcheng Zhong, Andrea Tagliasacchi, Forrester Cole, Cengiz Oztireli

Posted Pricing and Dynamic Prior-Independent Mechanisms with Value Maximizers
Yuan Deng, Vahab Mirrokni, Hanrui Zhang

Transformer Memory as a Differentiable Search Index
Yi Tay, Vinh Q. Tran, Mostafa Dehghani, Jianmo Ni, Dara Bahri, Harsh Mehta, Zhen Qin, Kai Hui, Zhe Zhao, Jai Gupta, Tal Schuster, William W. Cohen, Donald Metzler

*Work done while at Google.  


Taking AI into Clinical Production with MONAI Deploy

With a wide breadth of open source, accelerated AI frameworks at their fingertips, medical AI developers and data scientists are introducing new algorithms for…

With a wide breadth of open source, accelerated AI frameworks at their fingertips, medical AI developers and data scientists are introducing new algorithms for clinical applications at an extraordinary rate. Many of these models are nothing short of groundbreaking, yet 87% of data science projects never make it into production.

In most data science teams, model developers lack a fast, consistent, easy-to-use, and scalable way to develop and package trained AI models into market-ready medical AI applications. These applications can help clinicians streamline imaging workflows, uncover hidden insights, improve productivity, and connect multi-modal patient information for deeper patient understanding.

MONAI, the Medical Open Network for AI, is bridging this gap from development to clinical deployment with MONAI Deploy. MONAI Deploy provides a set of open source tools for developing, packaging, testing, deploying, and running medical AI applications. It allows developers to build AI applications, orchestrate clinical AI workflows, and interoperate with medical imaging systems like PACS (picture archiving and communication systems) over standards like DICOM, FHIR, and HL7.

Medical AI applications built with MONAI

With MONAI, developers, researchers, and data scientists are building applications for a wide range of medical AI applications, including:

  • Classifying medical imaging studies for the presence of a disease or condition
  • Segmenting organs, lesions, and other structures
  • Creating markups to highlight areas of concern with arrows or heatmaps
  • Deriving insights for radiologist review for inclusion in a medical report
  • Batch processing medical imaging exams during long-term storage or for DICOM migrations
  • Processing live streams of data to ensure the patient is positioned properly prior to image acquisition
  • Identifying QA issues during the acquisition process to streamline departmental workflows
  • Identifying trends in data for population health assessments 

MONAI Model Zoo is a curated library of more than 15 pretrained models (CT, MR, Pathology, Endoscopy) that can be transformed into MONAI AI applications, jump-starting AI application development.

MONAI Deploy applications

One of the key components of MONAI Deploy is the MONAI Deploy App SDK, which helps researchers and developers take one or more trained models and build an application with a few lines of code in under 20 minutes. The application is created as a MAP (MONAI Application Package). As a portable containerized application, it can be deployed and run in clinical production anywhere that has a Docker engine. 

Screenshot of a clinical review user interface of a MAP for segmenting a stroke lesion in a brain scan. Three patient records appear on the left with subsequent scans, segmentations, and metadata in the viewer.
Figure 1. An example of a MONAI Application Package (MAP) for stroke lesion segmentation in a brain scan developed by the AI Centre for Value Based Healthcare

The MONAI Deploy App SDK provides predefined operators that can be reused and connected in an application development workflow, or you can create custom ones. These operators parse DICOM studies, select specific series with application-defined rules, and convert the selected DICOM series into required image formats along with metadata representing the pertinent DICOM attributes. 

The image is then further processed in the preprocessing stage to normalize spacing, orientation, intensity, and more, before pixel data as Tensors are used for inference. It also includes DICOM writers such as DICOM Segmentation (SEG), DICOM Structured Reports (SR), and DICOM encapsulated Stereolithography (STL). 

The resulting MAP includes one or more trained models, associated metadata, and the necessary interoperability (preprocessing and postprocessing) to do clinical inference in a single container.

Diagram showing the medical imaging workflow starting with DICOM input and through the Load DICOM data, Segment lung, and Classify steps, and ending with Write DICOM output.
Figure 2. A typical medical imaging AI app development workflow from DICOM input to DICOM output

Creating and deploying your MAP

The first step in building a MAP is to write the application itself. This consists of designing a workflow, creating operator classes, implementing an application class, and executing the application locally. The application class brings together tasks in a workflow graph with the operators that can be debugged locally in a Jupyter notebook or through Command Line Interface (CLI).

The following code shows an application class definition example:

 from monai.deploy.core import Application, env, resource
 @resource(cpu=1, gpu=1, memory="2Gi")
 # pip_packages can be a string that is a path(str) to requirements.txt file or a list of packages.
 @env(pip_packages=["scikit-image >= 0.17.2"])
 class App(Application):
     """This is a very basic application.
    This showcases the MONAI Deploy application framework.

    # App's name. ('App') if not specified.
    name = "my_app"
    # App's description.  if not specified.
    description = "This is a reference application."
    # App's version.  or '0.0.0' if not specified.
    version = "0.1.0"

    def compose(self):
        # Execute `self.add_flow()` or `self.add_operator()` methods here.

if __name__ == "__main__":

The output of the application class is an application graph, which defines the flow of operators or tasks (Figure 3).

Flow diagram showing a series of operators in an application graph from DICOMDataLoader to DICOMSeriesSelector to DICOM SegmentationWriter
Figure 3. An example of an application graph that defines the flow of operators with the MONAI Deploy App SDK

After the application has been tested and verified, the application is packaged and deployed locally. The MONAI Deploy Application Packager converts an application into a deployable Docker image that can be executed locally, following the MAP specification.

To package an application to create a Docker image tagged my_app:latest use the following command:

$ monai-deploy package ./my_app -t my_app:latest --model ./

Building MONAI Application Package...
Successfully built my_app:latest

MONAI Deploy App SDK makes running and testing MAPs locally an easy process. The command-line Application Runner allows users to specify the input and output paths of the local file system to the input and output of the MAP during execution. It does not require an understanding of the internal details of the MAP.

MAPs can be deployed in multiple ways, each with different levels of integration with a hosting platform. Learn more about these options in the Deploying and Hosting MONAI App Package documentation. You can also see the list of platforms supporting MAPs.

Accelerating the MAP validation lifecycle with MONAI Deploy Express

For initial local testing of MAPs, the Application Runner within the MONAI Deploy App SDK is fast, simple, and recommended. However, the journey from development to production usually requires multiple steps across different environments, operated by different teams and with different requirements.

MONAI Deploy Express is designed to facilitate the testing and validation of MAPs in the early stages of this pipeline (or workstation environment), where ease of use and time to get started are most important. 

Using straightforward technologies like Docker and Docker Compose, MONAI Deploy Express can be installed in about 30 minutes. It allows users to quickly run MAPs, connect to a test PACS or their own test/research PACS for further validation, and confidently take steps towards production.

Reusing the same essential core services for DICOM I/O and AI workflow orchestration that could be used in a production environment provides the same functionality and consistent experience independent of where and how the applications are run, with minimal changes for the end user.

Diagram showing the MONAI Deploy Express end-to-end clinical data pipeline that includes the MONAI Informatics Gateway, MONAI Workflow Manager, and MONAI App SDK.
Figure 4. MONAI Deploy Express accelerates the validation of MAPs with an end-to-end clinical data pipeline that includes the MONAI Informatics Gateway, MONAI Workflow Manager, and MONAI App SDK

Bridging the gap from research innovation to clinical production

MONAI Deploy was designed to shorten the time-to-clinic for AI models. With the SDK, medical AI application developers and translational researchers can build AI applications that can run anywhere and accelerate the testing and validation of these models for clinical deployment.

To see real-world use cases for building MAPs with MONAI Deploy and explore clinical inference capabilities within the SDK, watch the on-demand lab, Creating Inference Applications for the Medical AI Project Lifecycle Using MONAI Deploy.

To get started with MONAI Deploy, install the MONAI Deploy App SDK using the following command: 

$ pip install monai-deploy-app-sdk

Numerous MONAI Deploy tutorials are available to help you create simple image processing apps, MedNIST classifier apps, and segmentation apps. Explore more MONAI Deploy resources to support your journey from development to deployment.

To validate your MAPs, download the latest release of MONAI Deploy Express and follow the README instructions. Sample workflows and MAPs for lung and liver segmentation are available, including validation datasets. Execution results can be visualized on Kibana.

Review, adopt, and help further improve the MAP specification. To review designs and requirements or open an issue, visit the monai-deploy-app-sdk GitHub repository.