
The world’s first braiding of non-Abelian anyons

Imagine you’re shown two identical objects and then asked to close your eyes. When you open your eyes, you see the same two objects in the same position. How can you determine if they have been swapped back and forth? Intuition and the laws of quantum mechanics agree: If the objects are truly identical, there is no way to tell.

While this sounds like common sense, it only applies to our familiar three-dimensional world. Researchers have predicted that for a special type of particle, called an anyon, that is restricted to move only in a two-dimensional (2D) plane, quantum mechanics allows for something quite different. Anyons are indistinguishable from one another and some, non-Abelian anyons, have a special property that causes observable differences in the shared quantum state under exchange, making it possible to tell when they have been exchanged, despite being fully indistinguishable from one another. While researchers have managed to detect their relatives, Abelian anyons, whose change under exchange is more subtle and impossible to directly detect, realizing “non-Abelian exchange behavior” has proven more difficult due to challenges with both control and detection.

In “Non-Abelian braiding of graph vertices in a superconducting processor”, published in Nature, we report the observation of this non-Abelian exchange behavior for the first time. Non-Abelian anyons could open a new avenue for quantum computation, in which quantum operations are achieved by swapping particles around one another like strings are swapped around one another to create braids. Realizing this new exchange behavior on our superconducting quantum processor could be an alternate route to so-called topological quantum computation, which benefits from being robust against environmental noise.

Exchange statistics and non-Abelian anyons

In order to understand how this strange non-Abelian behavior can occur, it’s helpful to consider an analogy with the braiding of two strings. Take two identical strings and lay them parallel next to one another. Swap their ends to form a double-helix shape. The strings are identical, but because they wrap around one another when the ends are exchanged, it is very clear when the two ends are swapped.

The exchange of non-Abelian anyons can be visualized in a similar way, where the strings are made from extending the particles’ positions into the time dimension to form “world-lines.” Imagine plotting two particles’ locations vs. time. If the particles stay put, the plot would simply be two parallel lines, representing their constant locations. But if we exchange the locations of the particles, the world lines wrap around one another. Exchange them a second time, and you’ve made a knot.

While a bit difficult to visualize, knots in four dimensions (three spatial plus one time dimension) can always easily be undone. They are trivial — like a shoelace, simply pull one end and it unravels. But when the particles are restricted to two spatial dimensions, the knots are in three total dimensions and — as we know from our everyday 3D lives — cannot always be easily untied. The braiding of the non-Abelian anyons’ world lines can be used as quantum computing operations to transform the state of the particles.

A key aspect of non-Abelian anyons is “degeneracy”: the full state of several separated anyons is not completely specified by local information, allowing the same anyon configuration to represent superpositions of several quantum states. Winding non-Abelian anyons about each other can change the encoded state.

How to make a non-Abelian anyon

So how do we realize non-Abelian braiding with one of Google’s quantum processors? We start with the familiar surface code, which we recently used to achieve a milestone in quantum error correction, where qubits are arranged on the vertices of a checkerboard pattern. Each color square of the checkerboard represents one of two possible joint measurements that can be made of the qubits on the four corners of the square. These so-called “stabilizer measurements” can return a value of either + or – 1. The latter is referred to as a plaquette violation, and can be created and moved diagonally — just like bishops in chess — by applying single-qubit X- and Z-gates. Recently, we showed that these bishop-like plaquette violations are Abelian anyons. In contrast to non-Abelian anyons, the state of Abelian anyons changes only subtly when they are swapped — so subtly that it is impossible to directly detect. While Abelian anyons are interesting, they do not hold the same promise for topological quantum computing that non-Abelian anyons do.

To produce non-Abelian anyons, we need to control the degeneracy (i.e., the number of wavefunctions that causes all stabilizer measurements to be +1). Since a stabilizer measurement returns two possible values, each stabilizer cuts the degeneracy of the system in half, and with sufficiently many stabilizers, only one wave function satisfies the criterion. Hence, a simple way to increase the degeneracy is to merge two stabilizers together. In the process of doing so, we remove one edge in the stabilizer grid, giving rise to two points where only three edges intersect. These points, referred to as “degree-3 vertices” (D3Vs), are predicted to be non-Abelian anyons.

In order to braid the D3Vs, we have to move them, meaning that we have to stretch and squash the stabilizers into new shapes. We accomplish this by implementing two-qubit gates between the anyons and their neighbors (middle and right panels shown below).

Non-Abelian anyons in stabilizer codes. a: Example of a knot made by braiding two anyons’ world lines. b: Single-qubit gates can be used to create and move stabilizers with a value of –1 (red squares). Like bishops in chess, these can only move diagonally and are therefore constrained to one sublattice in the regular surface code. This constraint is broken when D3Vs (yellow triangles) are introduced. c: Process to form and move D3Vs (predicted to be non-Abelian anyons). We start with the surface code, where each square corresponds to a joint measurement of the four qubits on its corners (left panel). We remove an edge separating two neighboring squares, such that there is now a single joint measurement of all six qubits (middle panel). This creates two D3Vs, which are non-Abelian anyons. We move the D3Vs by applying two-qubit gates between neighboring sites (right panel).

Now that we have a way to create and move the non-Abelian anyons, we need to verify their anyonic behavior. For this we examine three characteristics that would be expected of non-Abelian anyons:

  1. The “fusion rules” — What happens when non-Abelian anyons collide with each other?
  2. Exchange statistics — What happens when they are braided around one another?
  3. Topological quantum computing primitives — Can we encode qubits in the non-Abelian anyons and use braiding to perform two-qubit entangling operations?

The fusion rules of non-Abelian anyons

We investigate fusion rules by studying how a pair of D3Vs interact with the bishop-like plaquette violations introduced above. In particular, we create a pair of these and bring one of them around a D3V by applying single-qubit gates.

While the rules of bishops in chess dictate that the plaquette violations can never meet, the dislocation in the checkerboard lattice allows them to break this rule, meet its partner and annihilate with it. The plaquette violations have now disappeared! But bring the non-Abelian anyons back in contact with one another, and the anyons suddenly morph into the missing plaquette violations. As weird as this behavior seems, it is a manifestation of exactly the fusion rules that we expect these entities to obey. This establishes confidence that the D3Vs are, indeed, non-Abelian anyons.

Demonstration of anyonic fusion rules (starting with panel I, in the lower left). We form and separate two D3Vs (yellow triangles), then form two adjacent plaquette violations (red squares) and pass one between the D3Vs. The D3Vs deformation of the “chessboard” changes the bishop rules of the plaquette violations. While they used to lie on adjacent squares, they are now able to move along the same diagonals and collide (as shown by the red lines). When they do collide, they annihilate one another. The D3Vs are brought back together and surprisingly morph into the missing adjacent red plaquette violations.

Observation of non-Abelian exchange statistics

After establishing the fusion rules, we want to see the real smoking gun of non-Abelian anyons: non-Abelian exchange statistics. We create two pairs of non-Abelian anyons, then braid them by wrapping one from each pair around each other (shown below). When we fuse the two pairs back together, two pairs of plaquette violations appear. The simple act of braiding the anyons around one another changed the observables of our system. In other words, if you closed your eyes while the non-Abelian anyons were being exchanged, you would still be able to tell that they had been exchanged once you opened your eyes. This is the hallmark of non-Abelian statistics.

Braiding non-Abelian anyons. We make two pairs of D3Vs (panel II), then bring one from each pair around each other (III-XI). When fusing the two pairs together again in panel XII, two pairs of plaquette violations appear! Braiding the non-Abelian anyons changed the observables of the system from panel I to panel XII; a direct manifestation of non-Abelian exchange statistics.

Topological quantum computing

Finally, after establishing their fusion rules and exchange statistics, we demonstrate how we can use these particles in quantum computations. The non-Abelian anyons can be used to encode information, represented by logical qubits, which should be distinguished from the actual physical qubits used in the experiment. The number of logical qubits encoded in N D3Vs can be shown to be N/2–1, so we use N=8 D3Vs to encode three logical qubits, and perform braiding to entangle them. By studying the resulting state, we find that the braiding has indeed led to the formation of the desired, well-known quantum entangled state called the Greenberger-Horne-Zeilinger (GHZ) state.

Using non-Abelian anyons as logical qubits. a, We braid the non-Abelian anyons to entangle three qubits encoded in eight D3Vs. b, Quantum state tomography allows for reconstructing the density matrix, which can be represented in a 3D bar plot and is found to be consistent with the desired highly entangled GHZ-state.


Our experiments show the first observation of non-Abelian exchange statistics, and that braiding of the D3Vs can be used to perform quantum computations. With future additions, including error correction during the braiding procedure, this could be a major step towards topological quantum computation, a long-sought method to endow qubits with intrinsic resilience against fluctuations and noise that would otherwise cause errors in computations.


We would like to thank Katie McCormick, our Quantum Science Communicator, for helping to write this blog post.


Optimizing BIM Workflows Using USD at Every Design Phase

Image of an apartment building with callouts for different design phase tools.Siloed data has long been a challenge in architecture, engineering, and construction (AEC), hindering productivity and collaboration. However, new innovative…Image of an apartment building with callouts for different design phase tools.

Siloed data has long been a challenge in architecture, engineering, and construction (AEC), hindering productivity and collaboration. However, new innovative solutions are transforming the way that architects, engineers, and construction managers work together on BIM (building information management) workflows, offering new possibilities for real-time collaboration.

The new NVIDIA Omniverse Connector from Vectorworks exemplifies this potential, opening up exciting new workflow options. Vectorworks creates design software that serves the architecture, landscape, and entertainment industries. They specialize in hybrid 2D and 3D workflow solutions with an emphasis on visualization and non-proprietary collaboration.

Universal Scene Description (OpenUSD) helps Vectorworks provide their customers with even more flexibility in the design process and the ability to collaborate freely with anyone involved in a project. The connector helps optimize BIM workflows and provides real-time collaboration at every design phase. USD is an open and extensible framework and ecosystem for describing, composing, simulating, and collaborating within 3D worlds. It is the foundation of NVIDIA Omniverse.

What is building information modeling?

At its core, Vectorworks is building information modeling (BIM) software. BIM workflows nowadays are the defining force in the AEC industry, and they’re growing in popularity in landscape architecture, too.

BIM is a collaborative digital process that integrates design, construction, and operation information into a single model, enabling stakeholders to visualize, analyze, and coordinate all aspects of a building project. To get a short history of BIM and learn about its various use cases in AEC, see What is BIM | Building Information Modeling in the AEC Industry.

Although collaboration is essential to data-driven BIM workflows, there is currently a lack of software options for design collaboration and coordination.

Real-time BIM collaboration at every design phase

There are four standard phases involved in the design of a building.

First, architects must pull together all the necessary information to start their project for conceptual design, site planning, and analysis. With the Omniverse Connector, teams can pull this data from a wide variety of file formats and sources with Universal Scene Description (OpenUSD).

Next, teams explore schematic designs to refine initial design concepts and ideas. Intuitive drawing tools and a flexible modeling engine are easy to use with the connector, enabling you to explore different design options and evaluate their feasibility. You can easily transition from massing models to a BIM model, visualize concepts with integrated 3D rendering, and share the results in real-time with their entire team.

When moving into more detailed design phases, BIM tools can often become less creative in nature. The Vectorworks Omniverse Connector is a bit different. It enables you to freely sketch, model, and document your design ideas with precision-drafting capabilities so that you’re not limited by presets and strict parameters.

In the last phase of construction documentation, teams must coordinate with consultants and continue to verify and refine models to make sure that the models are ready for the real world. Omniverse makes this process seamless, enabling different users to collaborate in real time on models and have the models automatically updated in the construction documentation.

Almost any stakeholder in the design process can now collaborate in real time on design efforts with the new connector. This could be a building architect working with a landscape architect or a lighting designer collaborating with a set designer on a live event.

Combining models into Omniverse enables you to collaborate in a real-time, virtual environment in all project phases, enabling you to ensure that you’re developing cohesive design solutions.

Image shows a design model of an office with the connector in NVIDIA Omniverse.
Figure 1. A Vectorworks design model in NVIDIA Omniverse

Developing the connector in Omniverse

Implementing the Omniverse Connector was a straightforward and streamlined process. The Vectorworks developers followed the comprehensive NVIDIA documentation and referred to existing connectors for user interface and user experience design. They also asked questions in the Omniverse forums where answers provided directly from NVIDIA engineers and managers helped deliver the connector on schedule with the Vectorworks 2023 Service Pack 4 release.

Key to the development process was the Omniverse Connect HelloWorld sample, documentation that demonstrates how to build your own NVIDIA Omniverse Connector, along with a range of functionalities that enable you to interact with USD and the Omniverse Nucleus server.

The sample documentation made it easy to quickly start reaping the benefits of OpenUSD. It shows how to save projects to USD format so you can take advantage of creating and editing live layers. Then, you can create a USD stage, which serves as a container for organizing and manipulating the Vectorworks 3D scene data. This stage acts as a foundation for the subsequent operations.

Seamless workflow management is critical to the experience Vectorworks creates for users. The sample demonstrates how to create Omniverse Nucleus checkpoints, which serve as saved states of the stage. These checkpoints provide the ability to revert to a previous stage configuration, facilitating experimentation and version control.

The sample also demonstrates how to enhance communication and collaboration for users by implementing the capability to send and receive messages over a channel on the Nucleus server. This feature fosters interaction and coordination among users within a shared environment.

“Ultimately, the ease with which we were able to implement the Omniverse Connector speaks to how attainable the entry point is and enabled us to easily leverage Omniverse APIs that enhance workflows for our users,” said Dave Donley, senior director of rendering and research at Vectorworks.

“The collaborative, multi-disciplinary nature of Omniverse is a great fit, especially with OpenUSD at its foundation and we can’t wait to see the great designs our customers will produce with it.”

A scene in Vectorworks imported into NVIDIA Omniverse, which is now possible with Vectorworks Service Pack 4.
Figure 2. The release of Service Pack 4 for Vectorworks delivers a direct connection to NVIDIA Omniverse

Maintaining a competitive advantage with next-gen tech

Staying up-to-date with technology trends is crucial for maintaining a competitive advantage and identifying new opportunities for innovation and growth in your field.

By providing customers with powerful solutions that have the potential to revolutionize workflows like the NVIDIA Omniverse Connector, Vectorworks is making good on the promise of their public development roadmap: to constantly evolve their technology in line with the needs of customers during this time of accelerated digital transformation.

“At Vectorworks, we continue to evolve our digital solutions to empower customers to create and share great designs. With the latest update, which delivers a direct connection to NVIDIA Omniverse, comes yet another reminder of our passion and commitment to serve customers as a design partner—embracing the power and possibilities afforded by next-generation technology,” said Steve Johnson, chief technology officer at Vectorworks.

Not only does the NVIDIA Omniverse Connecter provide Vectorworks users with access to a powerful real-time visualization tool, but it also paves the way for exciting developments in the future.

As the design industries continue to mature, use cases for OpenUSD will continue to present themselves. Vectorworks’ compatibility with USD puts them in an advantageous position to be able to implement future technology. By embracing Omniverse’s immense potential, Vectorworks and its users are poised to lead the charge toward a more collaborative and innovative future in AEC.

To start taking advantage of this powerful connection, see the NVIDIA Omniverse Connector page at Vectorworks.

Get started with NVIDIA Omniverse by downloading the standard license free, or learn how Omniverse Enterprise can connect your team. If you’re a developer, get started building your first extension or developing a Connector with Omniverse resources. Stay up-to-date on the platform by subscribing to the newsletter, and following NVIDIA Omniverse on Instagram, Medium, and Twitter. For resources, check out our forums, Discord server, Twitch, and YouTube channels.


Into the Omniverse: Universal Scene Description Support for Marvelous Designer Lets Users Tailor Digital Assets, Clothes for 3D Characters

Whether animating fish fins or fashioning chic outfits for digital characters, creators can tap Marvelous Designer software to compose and tailor assets, clothes and other materials for their 3D workflows.


Shell-e-brate Good Times in 3D With ‘Kingsletter’ This Week ‘In the NVIDIA Studio’

Amir Anbarestani, an accomplished 3D artist who goes by the moniker Kingsletter, had a “shell of a good time” creating his Space Turtle scene this week In the NVIDIA Studio.


How to Successfully Integrate NVIDIA DLSS 3

NVIDIA DLSS Frame Generation is the new performance multiplier in DLSS 3 that uses AI to create entirely new frames. This breakthrough has made real-time path…

NVIDIA DLSS Frame Generation is the new performance multiplier in DLSS 3 that uses AI to create entirely new frames. This breakthrough has made real-time path tracing—the next frontier in video game graphics—possible.

NVIDIA has made it easier for you to take full advantage of this technology with the release of the Unreal Engine 5.2 Plugin and Streamline 2.1 SDK.

Unreal Engine developers can get started now. Coupled with the NVIDIA Reflex low-latency technology available through Unreal Engine 5, they have all the tools to boost game performance while providing a highly responsive experience for players.

Video 1. Bryan Catanzaro of the NVIDIA Applied Deep Learning Research team talks through NVIDIA DLSS 3

If you’re looking to do an integration within your own custom engine, Streamline 2.1 greatly simplifies the manual API hooking for all necessary components needed for DLSS 3. Streamline is an open-source cross-IHV framework that simplifies the integration of features like DLSS 3.

Instead of manually integrating the DLSS Frame Generation libraries, you identify which resources (motion vectors, depth, and so on) are required for the desired plug-in and then trigger when to execute the plug-ins in the rendering pipeline. Here are the necessary steps to ensure that your integrations take full advantage of DLSS 3:

  1. Integrate the Streamline 2.1 SDK: To add Streamline to your application, follow the Streamline Manual Hooking guide. Integrate without any features and focus on tasks such as manual hooking and resource state tracking.
  2. Perform a security check: Verify the NVIDIA and Streamline dual signatures on sl.itnerposer.dll before loading the DLL. Follow the verification process within the Security section of the programming guide.
  3. Check for system support: The DLSS 3 components (Super Resolution, Frame Generation, and NVIDIA Reflex) all have varied system requirements. Check for hardware and software system support and show appropriate error messages based on reported support.
  4. Integrate DLSS Super Resolution through Streamline: Pass in the necessary input resources and set up the upscaling pipeline. Follow these integration steps before all other post-processing.
  5. Evaluate integration: Validate and confirm image quality and performance benefits from DLSS Super Resolution.
  6. Integrate NVIDIA Reflex through Streamline: Add Reflex and its sub-features to the rendering pipeline. Make sure to place Reflex markers in the appropriate location or where your application should sleep.
  7. Confirm system latency reduction: There are three primary ways to check that input latency was reduced:
  8. Integrate DLSS Frame Generation through Streamline: Follow these integration steps and pass in the appropriate constants, camera matrices, and input resources in your post-processing pipeline. Pass in all the input resources marked for DLSS Super Resolution (for example, hudless and UIColor Color with Alpha). Disable DLSS Frame Generation when appropriate, such as when in-menu or for scene transitions.
  9. Validate DLSS Frame Generation inputs: Use the sl.imgui plugin to validate inputs (camera matrices, depth, MVEC, color, and so on). We recommend using ICAT to validate image quality and FrameView to validate latency. Lastly, buffer visualization using the development DLLs.
  10. Swap to production DLLs: After image quality and performance benefits from DLSS Frame Generation are validated, replace the watermarked DLLs with non-watermarked, production-ready DLLs from NVIDIA.

For an integration checklist and the most asked questions for DLSS Super Resolution, Frame Generation, and NVIDIA Reflex, see Streamline Getting Started (registration required). To learn more about the new DLSS plugin in Unreal Engine 5, see the Unreal Engine page.

Game developers can find additional free resources to re-create fully path-traced and AI-driven virtual worlds on the NVIDIA Game Development page.


Now Available: NVIDIA DLSS 3 for Unreal Engine 5

Picture of round doorway in courtyard.NVIDIA DLSS 3 is a neural graphics technology that multiplies performance using AI image reconstruction and frame generation. It’s a combination of three core…Picture of round doorway in courtyard.

NVIDIA DLSS 3 is a neural graphics technology that multiplies performance using AI image reconstruction and frame generation. It’s a combination of three core innovations:

  • Super Resolution uses deep learning algorithms to upscale a lower-resolution input into a higher-resolution output, creating a sharp image with a boosted frame rate.
  • Frame Generation uses AI rendering to generate entirely new frames with best-in-class quality and responsiveness.
  • NVIDIA Reflex is a low-latency technology that minimizes input lag by synchronizing the CPU and the GPU for optimal responsiveness.

Powered by these three technologies, DLSS 3 enables upwards of 4x performance boosts, providing headroom for next-generation, path-traced rendering. 

DLSS Super Resolution has been available in Unreal Engine since 2021, making it easy to integrate NVIDIA AI scaling technology into Unreal Engine projects. NVIDIA has now released DLSS 3 for Unreal Engine 5.2, which includes Frame Generation and the latest NVIDIA Reflex version. For more information about Unreal Engine 5.1 and earlier, see step 2 in the installation guide later in this post.

DLSS 3 reconstructs seven-eighths of the total displayed pixels, increasing performance significantly.
Figure 1. Super Resolution and Frame Generation create upscaled images together

To make integrating NVIDIA technology into your project as simple as possible, the new DLSS 3 Unreal Engine 5.2 package contains the Frame Generation, Super Resolution, and NVIDIA Reflex plugins all in a single download.

DLSS 3 technologies

The DLSS Frame Generation plugin uses Frame Generation to create entirely new frames by analyzing sequential frames and motion data from the Optical Flow Accelerator in GeForce RTX 40 Series GPUs.

Bundled inside the DLSS Frame Generation plugin is NVIDIA Reflex. Paired with DLSS 3, NVIDIA Reflex reduces onscreen latency by up to 2x compared to native rendering.

Video 1. THE FINALS | Beta Gameplay with DLSS 3, Ray Tracing, and NVIDIA Reflex

The DLSS Super Resolution plugin supports a variety of image quality modes—from Ultra Performance to Quality—determined by the native resolution relative to the DLSS output resolution. DLSS Super Resolution is customizable based on the needs of your game, with additional NVIDIA technologies included in the plugin:

  • Deep Learning Anti-Aliasing Mode (DLAA) offers an AI-based anti-aliasing mode for users who have spare GPU headroom and want higher levels of image quality.
  • NVIDIA Image Scaling is an open-source spatial upscaler and sharpening algorithm that is available for all platforms. 

The DLSS 3 Unreal Engine 5.2 plugin is delivered with the latest optimizations to NVIDIA AI algorithms, always learning and evolving with over-the-air updates.

How to install DLSS 3 for Unreal Engine

Follow these steps to download and install DLSS 3 for your Unreal Engine project.

  1. Agree to the Terms of the License Agreement and download DLSS 3 for your version of Unreal Engine.
  2. Unzip the DLSS folder. Only the 5.2 version of DLSS contains the Streamline/Frame Generation plugin.
  3. Copy the plugin folders to install to the /Engine/Plugins/MarketPlace folder of your Unreal Engine directory. If you don’t currently have a /MarketPlace folder, create one.
  4. Launch Unreal Editor, go to Plugins, and search for the plugins to activate. Search for “NVIDIA” to quickly list all of the included DLSS 3 plugins.
  5. Activate and restart Unreal Editor.
  6. Load the DLSS 3 Test project from the /Samples folder of the downloaded DLSS plugin file.

For prior versions of Unreal, you must build from the source and modify your source code with a small patch. For more information, see the included DLSS Frame Generation Quick Start Guide PDF in the download .zip file.

Tips for using DLSS 3 in Unreal Engine

After DLSS 3 is installed, follow these steps to verify that the Frame Generation, Super Resolution, and Reflex plugins are integrated into your project correctly.

  • To confirm that DLSS Frame Generation is working, along with real-time statistics, navigate to project settings, and then to your preferences for the NVIDIA Streamline plugin. Toggle the Load Debug Overlay option.
    • The Load Debug Overlay option for Frame Generation works in the editor and can appear in development or debug builds, but won’t appear in production builds.
  • To update Streamline automatically as well as DLSS AI algorithms with the latest improvements, use the same settings window to ensure that the Allow OTA Update option is enabled.
  • In the Unreal Editor, Frame Generation only works from a new editor window (PIE) or in Standalone mode. It doesn’t work from the selected viewport or while editing.
  • If any of the included DLSS 3 technologies aren’t working, check the output log or look for onscreen warning messages. A common issue may be that the NVIDIA drivers may have to be updated, for example.
  • The DLSS 3 Unreal Engine plugin contains the latest NVIDIA Reflex technology, a newer version than the version currently built into Unreal Engine. While it’s possible to keep the earlier plugin enabled, and even use the earlier NVIDIA Reflex Blueprint scripts, we recommended that you disable the earlier NVIDIA Reflex plugin and use the new version bundled in DLSS 3 Streamline instead.
  • We recommend that you set up all NVIDIA plugins through Blueprint scripts, as this enables you to conveniently activate plugins from menus and set preferences for users. However, if you need access to the console commands, they can be found under r.ngx. For more information about using console commands, see the DLSS Quick Start Guide PDF included in the DLSS 3 plugin download.
  • When Frame Generation is on, we recommend that you disable VSYNC in your application. The DLSS 3 plugin can set VSYNC to behave incorrectly when active. VSYNC can be disabled with the r.vsync 0 console command.

Download DLSS 3 for Unreal Engine

DLSS 3 for Unreal Engine makes the latest NVIDIA advancements in neural rendering and performance multiplication easy to integrate into your UE project. Get started with the Frame Generation, Super Resolution, and Reflex plugins now.

DLSS 3 for Unreal Engine 5.2 is now available.

For more information, see NVIDIA technologies supported by Unreal Engine 5.


Webinar: AI-Enabled Cybersecurity for Financial Services

A closeup of a laptop with financial projections.Learn how financial firms can build automated, real-time fraud and threat detection solutions with NVIDIA Morpheus.A closeup of a laptop with financial projections.

Learn how financial firms can build automated, real-time fraud and threat detection solutions with NVIDIA Morpheus.


NVIDIA CEO: Creators Will Be “Supercharged” by Generative AI

Generative AI will “supercharge” creators across industries and content types, NVIDIA founder and CEO Jensen Huang said today at the Cannes Lions Festival, on the French Riviera. “For the very first time, the creative process can be amplified in content generation, and the content generation could be in any modality — it could be be Read article >


Visual Foundation Models for Medical Image Analysis

Image of torso from medical segmentation scan.The analysis of 3D medical images is crucial for advancing clinical responses, disease tracking, and overall patient survival. Deep learning models form the…Image of torso from medical segmentation scan.

The analysis of 3D medical images is crucial for advancing clinical responses, disease tracking, and overall patient survival. Deep learning models form the backbone of modern 3D medical representation learning, enabling precise spatial context measurements that are essential for clinical decision-making. These 3D representations are highly sensitive to the physiological properties of medical imaging data, such as CT or MRI scans.

Medical image segmentation, a key visual task for medical applications, serves as a quantitative tool for measuring various aspects of medical images. To improve the analysis of these images, the development and application of foundation models are becoming increasingly important in the field of medical image analysis.

What are foundation models?

Foundation models, the latest generation of AI neural networks, are trained on extensive, diverse datasets and can be employed for a wide range of tasks or targets.

As large language models demonstrate their capability to tackle generic tasks, visual foundation models are emerging to address various problems, including classification, detection, and segmentation.

Foundation models can be used as powerful AI neural networks for segmenting different targets in medical images. It opens up a world of possibilities for medical imaging applications, enhancing the effectiveness of segmentation tasks and enabling more accurate measurements.

Challenges in medical image analysis

The application of medical foundation models in medical image analysis poses significant challenges. Unlike general computer vision models, medical image applications typically demand high-level domain knowledge.

Institutes have traditionally created fully annotated datasets for specific targets like spleens or tumors, relying solely on the association between input data features and target labels. Addressing multiple targets is more difficult, as manual annotations are laborious and time-consuming. Training larger or multi-task models is also increasingly challenging.

Despite recent advancements, there is still a long-standing issue in comprehending large medical imaging data due to its heterogeneity:

  • Medical volumetric data is often extremely high-resolution, necessitating substantial computational resources.
  • Current deep learning models have yet to effectively capture anatomical variability.
  • The large-scale nature of medical imaging data makes learning robust and efficient 3D representations difficult, particularly when dealing with heterogeneous data.

However, the modern analysis of high-resolution, high-dimensional, and large-scale medical volumetric data presents an opportunity to accelerate discoveries and obtain innovative insights into human body functions, behavior, and disease.

Foundation models offer the capability to address the heterogeneous variations that complicate the rectification of inter– and intra-subject differences. AI has the potential to revolutionize medical imaging by enabling more accurate and efficient analysis of large-scale, complex data.

A platform for medical visual segmentation foundation models

MONAI Model Zoo serves as a platform for hosting medical visual foundation models. It contains a collection of pretrained models for medical imaging tasks developed using the Medical Open Network for AI (MONAI) framework.

The MONAI Model Zoo is a publicly available resource that provides access to a variety of pretrained models for different medical imaging tasks, such as segmentation, classification, registration, and synthesis. These pretrained models can be used as starting points or foundation models for training on new datasets or fine-tuning for specific applications.

The MONAI Model Zoo is designed to accelerate the development of new medical imaging applications and enable researchers and clinicians to leverage pre-existing models and build on top of them.

Whole-body CT segmentation

Segmenting the entirety of a whole-body CT scan from a single model is a daunting task. However, the MONAI team has risen to the challenge. They’ve developed models that segment all 104 anatomical structures from a single model:

  • 27 organs
  • 59 bones
  • 10 muscles
  • 8 vessels

Using the dataset released by the totalSegmentator team, MONAI conducted research and benchmarking to achieve fast inference times. For a high-resolution 1.5 mm model, the inference time using a single NVIDIA V100 GPU for all 104 structures is just 4.12 seconds, while the inference time using a CPU is 30.30 seconds. This is a significant improvement from the original paper’s reported inference time for a single CT scan, which took more than 1 minute.

To access the MONAI Whole Body CT Segmentation foundation model, see the MONAI Model Zoo.

For more information about the overview of all anatomical structures in whole-body CT scans, see the TotalSegmentator: robust segmentation of 104 anatomical structures in CT images whitepaper.

3D Slicer user interface showing segmentations of a torso from multiple angles.
Figure 1. Segmenting 104 anatomical structures in whole-body CT scan

(Source: TotalSegmentator: robust segmentation of 104 anatomical structures in CT images)

Whole-brain MRI segmentation

Whole-brain segmentation is a critical technique in medical image analysis, providing a non-invasive means of measuring brain regions from clinical structural magnetic resonance imaging (MRI). However, with over 130 substructures in human brains, segmenting anything in the brain is a difficult challenge for MRI 3D segmentation. Unfortunately, detailed annotations of the brain are scarce, making this task even more challenging for the medical imaging community.

To address this issue, the MONAI team collaborated with Vanderbilt University to develop a deep learning model that can simultaneously segment all 133 brain structures. Using 3D Slicer, the MONAI model can infer the entire brain in just 2.0 seconds. The MONAI whole brain MRI segmentation model represents a promising development in medical imaging research, offering a valuable resource for improving the accuracy of brain measurements in clinical settings.

Visit the MONAI Model Zoo to access the MONAI Whole Brain MRI Segmentation Foundation Model.

3D Slicer user interface showing segmentation of anatomical structures in a brain from multiple angles.
Figure 2. Segmenting 133 anatomical structures in T1 brain MRI scan

How to access medical imaging foundation models

The use of foundation models in medical image analysis has great potential to improve diagnostic accuracy and enhance patient care. However, it’s important to recognize that medical application requires strong domain knowledge.

With the ability to process large amounts of data and identify subtle patterns and anomalies, foundation models have proven to be valuable tools in the medical image analysis field. The development and refinement of these models is ongoing, with researchers and practitioners working to improve their accuracy and expand their capabilities.

Although challenges such as patient privacy and potential biases must be addressed, the use of foundation models has already demonstrated significant benefits. It is expected to play a more prominent role in healthcare in the future.

As researchers, clinicians, and users continue to focus on foundation models, the MONAI Model Zoo, a platform hosting pretrained medical image models, is amplifying its impact. Fine-tuning pretrained models is crucial to the future of medical image analysis.

The MONAI Model Zoo provides access to a diverse collection of pretrained models for various medical imaging tasks, including segmentation, classification, registration, and synthesis. By using these pre-existing models as starting points, researchers and clinicians can accelerate the development of new medical imaging applications, saving time and resources.

Join us in driving innovation and collaboration in medical imaging research by exploring the MONAI Model Zoo today.


Google at CVPR 2023

This week marks the beginning of the premier annual Computer Vision and Pattern Recognition conference (CVPR 2023), held in-person in Vancouver, BC (with additional virtual content). As a leader in computer vision research and a Platinum Sponsor, Google Research will have a strong presence across CVPR 2023 with 90 papers being presented at the main conference and active involvement in over 40 conference workshops and tutorials.

If you are attending CVPR this year, please stop by our booth to chat with our researchers who are actively exploring the latest techniques for application to various areas of machine perception. Our researchers will also be available to talk about and demo several recent efforts, including on-device ML applications with MediaPipe, strategies for differential privacy, neural radiance field technologies and much more.

You can also learn more about our research being presented at CVPR 2023 in the list below (Google affiliations in bold).

Board and organizing committee

Senior area chairs include: Cordelia Schmid, Ming-Hsuan Yang

Area chairs include: Andre Araujo, Anurag Arnab, Rodrigo Benenson, Ayan Chakrabarti, Huiwen Chang, Alireza Fathi, Vittorio Ferrari, Golnaz Ghiasi, Boqing Gong, Yedid Hoshen, Varun Jampani, Lu Jiang, Da-Cheng Jua, Dahun Kim, Stephen Lombardi, Peyman Milanfar, Ben Mildenhall, Arsha Nagrani, Jordi Pont-Tuset, Paul Hongsuck Seo, Fei Sha, Saurabh Singh, Noah Snavely, Kihyuk Sohn, Chen Sun, Pratul P. Srinivasan, Deqing Sun, Andrea Tagliasacchi, Federico Tombari, Jasper Uijlings

Publicity Chair: Boqing Gong

Demonstration Chair: Jonathan T. Barron

Program Advisory Board includes: Cordelia Schmid, Richard Szeliski


History and Future of Artificial Intelligence and Computer Vision

Panelists include: Chelsea Finn

Scientific Discovery and the Environment

Panelists include: Sara Beery

Best Paper Award candidates

MobileNeRF: Exploiting the Polygon Rasterization Pipeline for Efficient Neural Field Rendering on Mobile Architectures

Zhiqin Chen, Thomas Funkhouser, Peter Hedman, Andrea Tagliasacchi

DynIBaR: Neural Dynamic Image-Based Rendering

Zhengqi Li, Qianqian Wang, Forrester Cole, Richard Tucker, Noah Snavely

DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation

Nataniel Ruiz*, Yuanzhen Li, Varun Jampani, Yael Pritch, Michael Rubinstein, Kfir Aberman

On Distillation of Guided Diffusion Models

Chenlin Meng, Robin Rombach, Ruiqi Gao, Diederik Kingma, Stefano Ermon, Jonathan Ho, Tim Salimans

Highlight papers

Connecting Vision and Language with Video Localized Narratives

Paul Voigtlaender, Soravit Changpinyo, Jordi Pont-Tuset, Radu Soricut, Vittorio Ferrari

MaskSketch: Unpaired Structure-Guided Masked Image Generation

Dina Bashkirova*, Jose Lezama, Kihyuk Sohn, Kate Saenko, Irfan Essa

SPARF: Neural Radiance Fields from Sparse and Noisy Poses

Prune Truong*, Marie-Julie Rakotosaona, Fabian Manhardt, Federico Tombari

MAGVIT: Masked Generative Video Transformer

Lijun Yu*, Yong Cheng, Kihyuk Sohn, Jose Lezama, Han Zhang, Huiwen Chang, Alexander Hauptmann, Ming-Hsuan Yang, Yuan Hao, Irfan Essa, Lu Jiang

Region-Aware Pretraining for Open-Vocabulary Object Detection with Vision Transformers

Dahun Kim, Anelia Angelova, Weicheng Kuo

I2MVFormer: Large Language Model Generated Multi-View Document Supervision for Zero-Shot Image Classification

Muhammad Ferjad Naeem, Gul Zain Khan, Yongqin Xian, Muhammad Zeshan Afzal, Didier Stricker, Luc Van Gool, Federico Tombari

Improving Robust Generalization by Direct PAC-Bayesian Bound Minimization

Zifan Wang*, Nan Ding, Tomer Levinboim, Xi Chen, Radu Soricut

Imagen Editor and EditBench: Advancing and Evaluating Text-Guided Image Inpainting (see blog post)

Su Wang, Chitwan Saharia, Ceslee Montgomery, Jordi Pont-Tuset, Shai Noy, Stefano Pellegrini, Yasumasa Onoe, Sarah Laszlo, David J. Fleet, Radu Soricut, Jason Baldridge, Mohammad Norouzi, Peter Anderson, William Cha

RUST: Latent Neural Scene Representations from Unposed Imagery

Mehdi S. M. Sajjadi, Aravindh Mahendran, Thomas Kipf, Etienne Pot, Daniel Duckworth, Mario Lučić, Klaus Greff

REVEAL: Retrieval-Augmented Visual-Language Pre-Training with Multi-Source Multimodal Knowledge Memory (see blog post)

Ziniu Hu*, Ahmet Iscen, Chen Sun, Zirui Wang, Kai-Wei Chang, Yizhou Sun, Cordelia Schmid, David Ross, Alireza Fathi

RobustNeRF: Ignoring Distractors with Robust Losses

Sara Sabour, Suhani Vora, Daniel Duckworth, Ivan Krasin, David J. Fleet, Andrea Tagliasacchi


AligNeRF: High-Fidelity Neural Radiance Fields via Alignment-Aware Training

Yifan Jiang*, Peter Hedman, Ben Mildenhall, Dejia Xu, Jonathan T. Barron, Zhangyang Wang, Tianfan Xue*

BlendFields: Few-Shot Example-Driven Facial Modeling

Kacper Kania, Stephan Garbin, Andrea Tagliasacchi, Virginia Estellers, Kwang Moo Yi, Tomasz Trzcinski, Julien Valentin, Marek Kowalski

Enhancing Deformable Local Features by Jointly Learning to Detect and Describe Keypoints

Guilherme Potje, Felipe Cadar, Andre Araujo, Renato Martins, Erickson Nascimento

How Can Objects Help Action Recognition?

Xingyi Zhou, Anurag Arnab, Chen Sun, Cordelia Schmid

Hybrid Neural Rendering for Large-Scale Scenes with Motion Blur

Peng Dai, Yinda Zhang, Xin Yu, Xiaoyang Lyu, Xiaojuan Qi

IFSeg: Image-Free Semantic Segmentation via Vision-Language Model

Sukmin Yun, Seong Park, Paul Hongsuck Seo, Jinwoo Shin

Learning from Unique Perspectives: User-Aware Saliency Modeling (see blog post)

Shi Chen*, Nachiappan Valliappan, Shaolei Shen, Xinyu Ye, Kai Kohlhoff, Junfeng He

MAGE: MAsked Generative Encoder to Unify Representation Learning and Image Synthesis

Tianhong Li*, Huiwen Chang, Shlok Kumar Mishra, Han Zhang, Dina Katabi, Dilip Krishnan

NeRF-Supervised Deep Stereo

Fabio Tosi, Alessio Tonioni, Daniele Gregorio, Matteo Poggi

Omnimatte3D: Associating Objects and their Effects in Unconstrained Monocular Video

Mohammed Suhail, Erika Lu, Zhengqi Li, Noah Snavely, Leon Sigal, Forrester Cole

OpenScene: 3D Scene Understanding with Open Vocabularies

Songyou Peng, Kyle Genova, Chiyu Jiang, Andrea Tagliasacchi, Marc Pollefeys, Thomas Funkhouser

PersonNeRF: Personalized Reconstruction from Photo Collections

Chung-Yi Weng, Pratul Srinivasan, Brian Curless, Ira Kemelmacher-Shlizerman

Prefix Conditioning Unifies Language and Label Supervision

Kuniaki Saito*, Kihyuk Sohn, Xiang Zhang, Chun-Liang Li, Chen-Yu Lee, Kate Saenko, Tomas Pfister

Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video Learning (see blog post)

AJ Piergiovanni, Weicheng Kuo, Anelia Angelova

Burstormer: Burst Image Restoration and Enhancement Transformer

Akshay Dudhane, Syed Waqas Zamir, Salman Khan, Fahad Shahbaz Khan, Ming-Hsuan Yang

Decentralized Learning with Multi-Headed Distillation

Andrey Zhmoginov, Mark Sandler, Nolan Miller, Gus Kristiansen, Max Vladymyrov

GINA-3D: Learning to Generate Implicit Neural Assets in the Wild

Bokui Shen, Xinchen Yan, Charles R. Qi, Mahyar Najibi, Boyang Deng, Leonidas Guibas, Yin Zhou, Dragomir Anguelov

Grad-PU: Arbitrary-Scale Point Cloud Upsampling via Gradient Descent with Learned Distance Functions

Yun He, Danhang Tang, Yinda Zhang, Xiangyang Xue, Yanwei Fu

Hi-LASSIE: High-Fidelity Articulated Shape and Skeleton Discovery from Sparse Image Ensemble

Chun-Han Yao*, Wei-Chih Hung, Yuanzhen Li, Michael Rubinstein, Ming-Hsuan Yang, Varun Jampani

Hyperbolic Contrastive Learning for Visual Representations beyond Objects

Songwei Ge, Shlok Mishra, Simon Kornblith, Chun-Liang Li, David Jacobs

Imagic: Text-Based Real Image Editing with Diffusion Models

Bahjat Kawar*, Shiran Zada, Oran Lang, Omer Tov, Huiwen Chang, Tali Dekel, Inbar Mosseri, Michal Irani

Incremental 3D Semantic Scene Graph Prediction from RGB Sequences

Shun-Cheng Wu, Keisuke Tateno, Nassir Navab, Federico Tombari

IPCC-TP: Utilizing Incremental Pearson Correlation Coefficient for Joint Multi-Agent Trajectory Prediction

Dekai Zhu, Guangyao Zhai, Yan Di, Fabian Manhardt, Hendrik Berkemeyer, Tuan Tran, Nassir Navab, Federico Tombari, Benjamin Busam

Learning to Generate Image Embeddings with User-Level Differential Privacy

Zheng Xu, Maxwell Collins, Yuxiao Wang, Liviu Panait, Sewoong Oh, Sean Augenstein, Ting Liu, Florian Schroff, H. Brendan McMahan

NoisyTwins: Class-Consistent and Diverse Image Generation Through StyleGANs

Harsh Rangwani, Lavish Bansal, Kartik Sharma, Tejan Karmali, Varun Jampani, Venkatesh Babu Radhakrishnan

NULL-Text Inversion for Editing Real Images Using Guided Diffusion Models

Ron Mokady*, Amir Hertz*, Kfir Aberman, Yael Pritch, Daniel Cohen-Or*

SCOOP: Self-Supervised Correspondence and Optimization-Based Scene Flow

Itai Lang*, Dror Aiger, Forrester Cole, Shai Avidan, Michael Rubinstein

Shape, Pose, and Appearance from a Single Image via Bootstrapped Radiance Field Inversion

Dario Pavllo*, David Joseph Tan, Marie-Julie Rakotosaona, Federico Tombari

TexPose: Neural Texture Learning for Self-Supervised 6D Object Pose Estimation

Hanzhi Chen, Fabian Manhardt, Nassir Navab, Benjamin Busam

TryOnDiffusion: A Tale of Two UNets

Luyang Zhu*, Dawei Yang, Tyler Zhu, Fitsum Reda, William Chan, Chitwan Saharia, Mohammad Norouzi, Ira Kemelmacher-Shlizerman

A New Path: Scaling Vision-and-Language Navigation with Synthetic Instructions and Imitation Learning

Aishwarya Kamath*, Peter Anderson, Su Wang, Jing Yu Koh*, Alexander Ku, Austin Waters, Yinfei Yang*, Jason Baldridge, Zarana Parekh

CLIPPO: Image-and-Language Understanding from Pixels Only

Michael Tschannen, Basil Mustafa, Neil Houlsby

Controllable Light Diffusion for Portraits

David Futschik, Kelvin Ritland, James Vecore, Sean Fanello, Sergio Orts-Escolano, Brian Curless, Daniel Sýkora, Rohit Pandey

CUF: Continuous Upsampling Filters

Cristina Vasconcelos, Cengiz Oztireli, Mark Matthews, Milad Hashemi, Kevin Swersky, Andrea Tagliasacchi

Improving Zero-Shot Generalization and Robustness of Multi-modal Models

Yunhao Ge*, Jie Ren, Andrew Gallagher, Yuxiao Wang, Ming-Hsuan Yang, Hartwig Adam, Laurent Itti, Balaji Lakshminarayanan, Jiaping Zhao

LOCATE: Localize and Transfer Object Parts for Weakly Supervised Affordance Grounding

Gen Li, Varun Jampani, Deqing Sun, Laura Sevilla-Lara

Nerflets: Local Radiance Fields for Efficient Structure-Aware 3D Scene Representation from 2D Supervision

Xiaoshuai Zhang, Abhijit Kundu, Thomas Funkhouser, Leonidas Guibas, Hao Su, Kyle Genova

Self-Supervised AutoFlow

Hsin-Ping Huang, Charles Herrmann, Junhwa Hur, Erika Lu, Kyle Sargent, Austin Stone, Ming-Hsuan Yang, Deqing Sun

Train-Once-for-All Personalization

Hong-You Chen*, Yandong Li, Yin Cui, Mingda Zhang, Wei-Lun Chao, Li Zhang

Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning (see blog post)

Antoine Yang*, Arsha Nagrani, Paul Hongsuck Seo, Antoine Miech, Jordi Pont-Tuset, Ivan Laptev, Josef Sivic, Cordelia Schmid

VILA: Learning Image Aesthetics from User Comments with Vision-Language Pretraining

Junjie Ke, Keren Ye, Jiahui Yu, Yonghui Wu, Peyman Milanfar, Feng Yang

You Need Multiple Exiting: Dynamic Early Exiting for Accelerating Unified Vision Language Model

Shengkun Tang, Yaqing Wang, Zhenglun Kong, Tianchi Zhang, Yao Li, Caiwen Ding, Yanzhi Wang, Yi Liang, Dongkuan Xu

Accidental Light Probes

Hong-Xing Yu, Samir Agarwala, Charles Herrmann, Richard Szeliski, Noah Snavely, Jiajun Wu, Deqing Sun

FedDM: Iterative Distribution Matching for Communication-Efficient Federated Learning

Yuanhao Xiong, Ruochen Wang, Minhao Cheng, Felix Yu, Cho-Jui Hsieh

FlexiViT: One Model for All Patch Sizes

Lucas Beyer, Pavel Izmailov, Alexander Kolesnikov, Mathilde Caron, Simon Kornblith, Xiaohua Zhai, Matthias Minderer, Michael Tschannen, Ibrahim Alabdulmohsin, Filip Pavetic

Iterative Vision-and-Language Navigation

Jacob Krantz, Shurjo Banerjee, Wang Zhu, Jason Corso, Peter Anderson, Stefan Lee, Jesse Thomason

MoDi: Unconditional Motion Synthesis from Diverse Data

Sigal Raab, Inbal Leibovitch, Peizhuo Li, Kfir Aberman, Olga Sorkine-Hornung, Daniel Cohen-Or

Multimodal Prompting with Missing Modalities for Visual Recognition

Yi-Lun Lee, Yi-Hsuan Tsai, Wei-Chen Chiu, Chen-Yu Lee

Scene-Aware Egocentric 3D Human Pose Estimation

Jian Wang, Diogo Luvizon, Weipeng Xu, Lingjie Liu, Kripasindhu Sarkar, Christian Theobalt

ShapeClipper: Scalable 3D Shape Learning from Single-View Images via Geometric and CLIP-Based Consistency

Zixuan Huang, Varun Jampani, Ngoc Anh Thai, Yuanzhen Li, Stefan Stojanov, James M. Rehg

Improving Image Recognition by Retrieving from Web-Scale Image-Text Data

Ahmet Iscen, Alireza Fathi, Cordelia Schmid

JacobiNeRF: NeRF Shaping with Mutual Information Gradients

Xiaomeng Xu, Yanchao Yang, Kaichun Mo, Boxiao Pan, Li Yi, Leonidas Guibas

Learning Personalized High Quality Volumetric Head Avatars from Monocular RGB Videos

Ziqian Bai*, Feitong Tan, Zeng Huang, Kripasindhu Sarkar, Danhang Tang, Di Qiu, Abhimitra Meka, Ruofei Du, Mingsong Dou, Sergio Orts-Escolano, Rohit Pandey, Ping Tan, Thabo Beeler, Sean Fanello, Yinda Zhang

NeRF in the Palm of Your Hand: Corrective Augmentation for Robotics via Novel-View Synthesis

Allan Zhou, Mo Jin Kim, Lirui Wang, Pete Florence, Chelsea Finn

Pic2Word: Mapping Pictures to Words for Zero-Shot Composed Image Retrieval

Kuniaki Saito*, Kihyuk Sohn, Xiang Zhang, Chun-Liang Li, Chen-Yu Lee, Kate Saenko, Tomas Pfister

SCADE: NeRFs from Space Carving with Ambiguity-Aware Depth Estimates

Mikaela Uy, Ricardo Martin Brualla, Leonidas Guibas, Ke Li

Structured 3D Features for Reconstructing Controllable Avatars

Enric Corona, Mihai Zanfir, Thiemo Alldieck, Eduard Gabriel Bazavan, Andrei Zanfir, Cristian Sminchisescu

Token Turing Machines

Michael S. Ryoo, Keerthana Gopalakrishnan, Kumara Kahatapitiya, Ted Xiao, Kanishka Rao, Austin Stone, Yao Lu, Julian Ibarz, Anurag Arnab

TruFor: Leveraging All-Round Clues for Trustworthy Image Forgery Detection and Localization

Fabrizio Guillaro, Davide Cozzolino, Avneesh Sud, Nicholas Dufour, Luisa Verdoliva

Video Probabilistic Diffusion Models in Projected Latent Space

Sihyun Yu, Kihyuk Sohn, Subin Kim, Jinwoo Shin

Visual Prompt Tuning for Generative Transfer Learning

Kihyuk Sohn, Yuan Hao, Jose Lezama, Luisa Polania, Huiwen Chang, Han Zhang, Irfan Essa, Lu Jiang

Zero-Shot Referring Image Segmentation with Global-Local Context Features

Seonghoon Yu, Paul Hongsuck Seo, Jeany Son

AVFormer: Injecting Vision into Frozen Speech Models for Zero-Shot AV-ASR (see blog post)

Paul Hongsuck Seo, Arsha Nagrani, Cordelia Schmid

DC2: Dual-Camera Defocus Control by Learning to Refocus

Hadi Alzayer, Abdullah Abuolaim, Leung Chun Chan, Yang Yang, Ying Chen Lou, Jia-Bin Huang, Abhishek Kar

Edges to Shapes to Concepts: Adversarial Augmentation for Robust Vision

Aditay Tripathi*, Rishubh Singh, Anirban Chakraborty, Pradeep Shenoy

MetaCLUE: Towards Comprehensive Visual Metaphors Research

Arjun R. Akula, Brendan Driscoll, Pradyumna Narayana, Soravit Changpinyo, Zhiwei Jia, Suyash Damle, Garima Pruthi, Sugato Basu, Leonidas Guibas, William T. Freeman, Yuanzhen Li, Varun Jampani

Multi-Realism Image Compression with a Conditional Generator

Eirikur Agustsson, David Minnen, George Toderici, Fabian Mentzer

NeRDi: Single-View NeRF Synthesis with Language-Guided Diffusion as General Image Priors

Congyue Deng, Chiyu Jiang, Charles R. Qi, Xinchen Yan, Yin Zhou, Leonidas Guibas, Dragomir Anguelov

On Calibrating Semantic Segmentation Models: Analyses and an Algorithm

Dongdong Wang, Boqing Gong, Liqiang Wang

Persistent Nature: A Generative Model of Unbounded 3D Worlds

Lucy Chai, Richard Tucker, Zhengqi Li, Phillip Isola, Noah Snavely

Rethinking Domain Generalization for Face Anti-spoofing: Separability and Alignment

Yiyou Sun*, Yaojie Liu, Xiaoming Liu, Yixuan Li, Wen-Sheng Chu

SINE: Semantic-Driven Image-Based NeRF Editing with Prior-Guided Editing Field

Chong Bao, Yinda Zhang, Bangbang Yang, Tianxing Fan, Zesong Yang, Hujun Bao, Guofeng Zhang, Zhaopeng Cui

Sequential Training of GANs Against GAN-Classifiers Reveals Correlated “Knowledge Gaps” Present Among Independently Trained GAN Instances

Arkanath Pathak, Nicholas Dufour

SparsePose: Sparse-View Camera Pose Regression and Refinement

Samarth Sinha, Jason Zhang, Andrea Tagliasacchi, Igor Gilitschenski, David Lindell

Teacher-Generated Spatial-Attention Labels Boost Robustness and Accuracy of Contrastive Models

Yushi Yao, Chang Ye, Gamaleldin F. Elsayed, Junfeng He


Computer Vision for Mixed Reality

Speakers include: Ira Kemelmacher-Shlizerman

Workshop on Autonomous Driving (WAD)

Speakers include: Chelsea Finn

Multimodal Content Moderation (MMCM)

Organizers include: Chris Bregler

Speakers include: Mevan Babakar

Medical Computer Vision (MCV)

Speakers include: Shekoofeh Azizi

VAND: Visual Anomaly and Novelty Detection

Speakers include: Yedid Hoshen, Jie Ren

Structural and Compositional Learning on 3D Data

Organizers include: Leonidas Guibas

Speakers include: Andrea Tagliasacchi, Fei Xia, Amir Hertz

Fine-Grained Visual Categorization (FGVC10)

Organizers include: Kimberly Wilber, Sara Beery

Panelists include: Hartwig Adam

XRNeRF: Advances in NeRF for the Metaverse

Organizers include: Jonathan T. Barron

Speakers include: Ben Poole

OmniLabel: Infinite Label Spaces for Semantic Understanding via Natural Language

Organizers include: Golnaz Ghiasi, Long Zhao

Speakers include: Vittorio Ferrari

Large Scale Holistic Video Understanding

Organizers include: David Ross

Speakers include: Cordelia Schmid

New Frontiers for Zero-Shot Image Captioning Evaluation (NICE)

Speakers include: Cordelia Schmid

Computational Cameras and Displays (CCD)

Organizers include: Ulugbek Kamilov

Speakers include: Mauricio Delbracio

Gaze Estimation and Prediction in the Wild (GAZE)

Organizers include: Thabo Beele

Speakers include: Erroll Wood

Face and Gesture Analysis for Health Informatics (FGAHI)

Speakers include: Daniel McDuff

Computer Vision for Animal Behavior Tracking and Modeling (CV4Animals)

Organizers include: Sara Beery

Speakers include: Arsha Nagrani

3D Vision and Robotics

Speakers include: Pete Florence

End-to-End Autonomous Driving: Perception, Prediction, Planning and Simulation (E2EAD)

Organizers include: Anurag Arnab

End-to-End Autonomous Driving: Emerging Tasks and Challenges

Speakers include: Sergey Levine

Multi-Modal Learning and Applications (MULA)

Speakers include: Aleksander Hołyński

Synthetic Data for Autonomous Systems (SDAS)

Speakers include: Lukas Hoyer

Vision Datasets Understanding

Organizers include: José Lezama

Speakers include: Vijay Janapa Reddi

Precognition: Seeing Through the Future

Organizers include: Utsav Prabhu

New Trends in Image Restoration and Enhancement (NTIRE)

Organizers include: Ming-Hsuan Yang

Generative Models for Computer Vision

Speakers include: Ben Mildenhall, Andrea Tagliasacchi

Adversarial Machine Learning on Computer Vision: Art of Robustness

Organizers include: Xinyun Chen

Speakers include: Deqing Sun

Media Forensics

Speakers include: Nicholas Carlini

Tracking and Its Many Guises: Tracking Any Object in Open-World

Organizers include: Paul Voigtlaender

3D Scene Understanding for Vision, Graphics, and Robotics

Speakers include: Andy Zeng

Computer Vision for Physiological Measurement (CVPM)

Organizers include: Daniel McDuff

Affective Behaviour Analysis In-the-Wild

Organizers include: Stefanos Zafeiriou

Ethical Considerations in Creative Applications of Computer Vision (EC3V)

Organizers include: Rida Qadri, Mohammad Havaei, Fernando Diaz, Emily Denton, Sarah Laszlo, Negar Rostamzadeh, Pamela Peter-Agbia, Eva Kozanecka

VizWiz Grand Challenge: Describing Images and Videos Taken by Blind People

Speakers include: Haoran Qi

Efficient Deep Learning for Computer Vision (see blog post)

Organizers include: Andrew Howard, Chas Leichner

Speakers include: Andrew Howard

Visual Copy Detection

Organizers include: Priya Goyal

Learning 3D with Multi-View Supervision (3DMV)

Speakers include: Ben Poole

Image Matching: Local Features and Beyond

Organizers include: Eduard Trulls

Vision for All Seasons: Adverse Weather and Lightning Conditions (V4AS)

Organizers include: Lukas Hoyer

Transformers for Vision (T4V)

Speakers include: Cordelia Schmid, Huiwen Chang

Scholars vs Big Models — How Can Academics Adapt?

Organizers include: Sara Beery

Speakers include: Jonathan T. Barron, Cordelia Schmid

ScanNet Indoor Scene Understanding Challenge

Speakers include: Tom Funkhouser

Computer Vision for Microscopy Image Analysis

Speakers include: Po-Hsuan Cameron Chen

Embedded Vision

Speakers include: Rahul Sukthankar

Sight and Sound

Organizers include: Arsha Nagrani, William Freeman

AI for Content Creation

Organizers include: Deqing Sun, Huiwen Chang, Lu Jiang

Speakers include: Ben Mildenhall, Tim Salimans, Yuanzhen Li

Computer Vision in the Wild

Organizers include: Xiuye Gu, Neil Houlsby

Speakers include: Boqing Gong, Anelia Angelova

Visual Pre-Training for Robotics

Organizers include: Mathilde Caron

Omnidirectional Computer Vision

Organizers include: Yi-Hsuan Tsai


All Things ViTs: Understanding and Interpreting Attention in Vision

Hila Chefer, Sayak Paul

Recent Advances in Anomaly Detection

Guansong Pang, Joey Tianyi Zhou, Radu Tudor Ionescu, Yu Tian, Kihyuk Sohn

Contactless Healthcare Using Cameras and Wireless Sensors

Wenjin Wang, Xuyu Wang, Jun Luo, Daniel McDuff

Object Localization for Free: Going Beyond Self-Supervised Learning

Oriane Simeoni, Weidi Xie, Thomas Kipf, Patrick Pérez

Prompting in Vision

Kaiyang Zhou, Ziwei Liu, Phillip Isola, Hyojin Bahng, Ludwig Schmidt, Sarah Pratt, Denny Zhou

* Work done while at Google