Categories
Misc

Deploying Edge AI in NVIDIA Headquarters

Since its inception, artificial intelligence (AI) has transformed every aspect of the global economy through the ability to solve problems of all sizes in every…

Since its inception, artificial intelligence (AI) has transformed every aspect of the global economy through the ability to solve problems of all sizes in every industry. NVIDIA has spent the last decade empowering companies to solve the world’s toughest problems such as improving sustainability, stopping poachers, and bettering cancer detection and care. What many don’t know is that behind the scenes, NVIDIA has also been leveraging AI to solve day-to-day issues, such as improving efficiency and user experiences. 

Improving efficiency and user experiences 

NVIDIA recently adopted AI to ease the process of entering the company’s headquarters while maintaining security. The IT department thought they could improve the traditional badge-based access control entry through turnstiles.

Using AI, NVIDIA designed a new experience where employees could sign up for contactless and hands-free entry to headquarters. Especially during the COVID-19 pandemic, the contactless access program has proven to be convenient, quick, secure, and safe. 

Watch the video below and read on to learn the steps NVIDIA implemented and the challenges that were overcome to deploy this scalable computer vision application.

Video 1. Learn the steps involved in deploying AI at the edge using NVIDIA Fleet Command

Unique considerations of edge environments

The objective of this project was to deliver a contactless access control solution that increased efficiency, security, and convenience for NVIDIA employees and could be scaled across multiple NVIDIA offices around the world. 

The solution had to be one that fit around existing infrastructure in the entranceway and office, conformed to policies put forth by the facilities and the security teams, and could be updated, upgraded, and scaled remotely across NVIDIA offices worldwide. 

Most edge AI solutions are deployed in environments where existing applications and systems already exist. Extra care needs to be taken to make sure that all constraints and requirements of the environment are taken into consideration.  

Below are the steps that NVIDIA took to deploy a vision AI application for the entrance of the NVIDIA Endeavor building. The process took six months to complete. 

  1. Understand the problem and goal: The goal of the project was for the IT team to build a solution that was able to be remotely managed, updated frequently, scalable to hundreds of sites worldwide, and compatible with NVIDIA processes. 
  2. Identify teams involved: The NVIDIA facilities, engineering, and IT teams were involved with the project, along with a third-party software vendor who provided the application storage and training data.  
  3. Set constraints and requirements: Each team set their respective constraints and requirements for the application. For example, the facilities team determined the accuracy, security, and redundancy requirements while engineering determined latency and performance, and dictated which parts of the solution could be solved with engineering, and which parts IT needed to solve. 
  4. Set up a test environment to validate proof of concept (POC): The process of an employee entering headquarters through turnstiles was simulated in a lab setting. During this process, the requirements set forth by engineering such as model accuracy and latency were met. 
  5. Run the pilot: The pilot program consisted of enabling nine physical turnstiles to run in real-time and onboarding 200 employees into the program to test it. 
  6. Put the AI model into production: 3,000 employees were onboarded into the program and the application was monitored for three months. After operating successfully for three months, the application was ready to be scaled to other buildings at NVIDIA headquarters and eventually, to offices worldwide. 
A GIF showing people walking through three separate security turnstiles at NVIDIA headquarters. The center turnstile showcases the AI solution that does not require swiping a badge.
Figure 1. Employees walk through the security turnstiles at NVIDIA headquarters. The center turnstile showcases the AI solution that does not require swiping a badge.

Challenges implementing AI in edge environments 

The biggest challenge was to create a solution that fit within the existing constraints and requirements of the environment. Every step of the process also came with unique implementation challenges, as detailed below.

Deliver the solution within the requirements 

Latency: The engineering team identified the application latency requirements of detection and entrance at 700 milliseconds. At every step of the POC and pilot, this benchmark was tested and validated. Due to the heavy requests that were sent to the server to identify each person, infrastructure load issues were experienced. To mitigate this, the server was placed within 30-40 feet of the turnstiles. This helped to decrease the latency, as latency decreases when the physical distance data has to travel decreases. 

Operations: Once the pilot program was scaled to 200 employees and the turnstiles were admitting multiple users into the building at a time, memory leaks were found. The memory leaks would cause the application to crash after four hours of operation. The issue was a simple engineering fix, but was not experienced during the initial pilot and POC phase. 

Keep application operational: As mentioned, due to the memory leaks that were identified, the application would crash after four hours of operation. It’s important to remember that during the POC and pilot phase, the application should be run for as long as it needs to during production. For example, if the application needs to run for 12 hours, it may successfully run for four hours during the POC, but that is not a good indicator of whether the application will work for the requisite 12 hours.

Secure the application 

Physical security: Edge locations are unique in that they are often physically accessible by individuals that can tamper with the solution. To avoid this, edge servers were placed in a nearby telecommunications room with access control. 

Secure software supply chain: The application was developed with security in mind, which is why enterprise-grade software like NVIDIA Fleet Command, which has the ability to automatically create an audit log of all actions, was utilized. Using software from a trusted source ensures that organizations have a line of support to speak to when needed. A common mistake from organizations deploying edge applications is downloading software online without researching whether it is from a trusted source and accidentally downloading malware. 

Manage the application

An edge management solution is essential. The NVIDIA IT team needed a tool that allowed for easy updates when bug fixes arose or model accuracy needed to be improved. Due to the global plans, the ability for an edge management solution to update hardware and software with minimal accuracy was also a priority. 

Addressing all of these functions is NVIDIA Fleet Command, a managed platform for AI container orchestration. Not only does Fleet Command streamline the provisioning and deployment of systems at the edge, it simplifies the management of distributed computing environments by allowing for remote system provisioning, over-the-air updates, remote application and system access, monitoring and alerting, and application logging, allowing IT to ensure these widely distributed environments are operational at all times.

A successful edge AI deployment 

Once edge infrastructure is in place, enterprises can easily add more models to the application. In this case, IT can add more applications to the server and even roll the solution out to NVIDIA offices worldwide, thanks to Fleet Command. It is the glue that holds our stack together and provides users with turnkey AI orchestration that keeps organizations from having to build, maintain, and secure edge AI deployments from the ground up themselves. 

For organizations that are deploying AI in edge environments, prepare beforehand using NVIDIA Launchpad, a free program that provides users with short-term access to a large catalog of hands-on labs. To learn more, see Managing Edge AI with the NVIDIA LaunchPad Free Trial.

Categories
Misc

Federated Learning from Simulation to Production with NVIDIA FLARE

NVIDIA FLARE 2.2 includes a host of new features that reduce development time and accelerate deployment for federated learning, helping organizations cut costs…

NVIDIA FLARE 2.2 includes a host of new features that reduce development time and accelerate deployment for federated learning, helping organizations cut costs for building robust AI. Get the details about what’s new in this release.

An open-source platform and software development kit (SDK) for Federated Learning (FL), NVIDIA FLARE continues to evolve to enable its end users to leverage distributed, multiparty collaboration for more robust AI development from simulation to production.

The release of FLARE 2.2 brings numerous updates that simplify the research and development workflow for researchers and data scientists, streamline deployment for IT practitioners and project leaders, and strengthen security to ensure data privacy in real-world deployments. These include:

Simplifying the researcher and developer workflow

  • FL Simulator for rapid development and debugging
  • Federated statistics
  • Integration with MONAI and XGBoost

Streamlining deployment, operations, and security

  • FLARE Dashboard
  • Unified FLARE CLI
  • Client-side privacy policies
Diagram of the end-to-end NVIDIA FLARE 2.2 workflow showing the progression from application development using FL Simulator, deployment with the FLARE Dashboard, to simplified operations and security with the FLARE console and site security policies.
Figure 1.  The end-to-end NVIDIA FLARE 2.2 workflow 

FL Simulator: Rapid development and debugging

One of the key features to enable research and development workflows is the new FL Simulator. The Simulator allows researchers and developers to run and debug a FLARE application without the overhead of provisioning and deploying a project. The Simulator provides a lightweight environment with a FLARE server and any number of connected clients on which an application can be deployed. Debugging is possible by leveraging the Simulator Runner API, allowing developers to drive an application with simple Python scripts to create breakpoints within the FLARE application code.

The Simulator is designed to accommodate systems with limited resources, such as a researcher’s laptop, by running client processes sequentially in a limited number of threads. The same simulation can be easily run on a larger system with multiple GPUs by allocating a client or multiple clients per GPU. This gives the developer or researcher a flexible environment to test application scalability. Once the application has been developed and debugged, the same application code can be directly deployed on a production, distributed FL system without change.

Screen captures of the FLARE Dashboard showing the Project Admin user management panel and the User panel for self-service downloads of project configuration and client software.
Figure 2. The FLARE Dashboard showing the Project Admin user management panel (left), and the User panel for self-service downloads of project configuration and client software (right)

Federated learning workflows and federated data science

FLARE 2.2 also introduces new integrations and federated workflows designed to simplify application development and enable federated data science and analytics.

Federated statistics

When working with distributed datasets, it is often important to assess the data quality and distribution across the set of client datasets. FLARE 2.2 provides a set of federated statistics operators (controllers and executors) that can be used to generate global statistics based on individual client-side statistics.  

The workflow controller and executor are designed to allow data scientists to quickly implement their own statistical methods (generators) based on the specifics of their datasets of interest. Commonly used statistics are provided out-of-the box, including count, sum, mean, standard deviation, and histograms, along with routines to visualize the global and individual statistics. The built-in visualization tools can be used to view statistics across all datasets at all sites as well as global aggregates, for example in a notebook utility as shown in Figure 3.

Example histograms from the federated statistics image example using the built-in FLARE statistics visualization class.
Figure 3. Example histograms from the federated statistics image example using the built-in FLARE statistics visualization class

In addition to these new workflows, the existing set of FLARE examples have been updated to integrate with the FL Simulator and leverage new privacy and security features. These example applications leverage common Python toolkits like NumPy, PyTorch, and Tensorflow, and highlight workflows in training, cross validation, and federated analysis.

Integration of FLARE and MONAI

MONAI, the Medical Open Network for AI, recently released an abstraction that allows MONAI models packaged in the MONAI Bundle (MB) format to be easily extended for federated training on any platform that implements client training algorithms in these new APIs. FLARE 2.2 includes a new client executor that makes this integration, allowing MONAI model developers to easily develop and share models using the bundle concept, and then seamlessly deploy these models in a federated paradigm using NVIDIA FLARE.

Diagram showing MONAI FL integration with ClientAlgo API
Figure 4. MONAI FL integration with ClientAlgo API

To see an example of using FLARE to train a medical image analysis model using federated averaging (FedAvg) and MONAI Bundle, visit NVFlare on GitHub. The example shows how to download the dataset, download the spleen_ct_segmentation bundle from the MONAI Model Zoo, and how to execute it with FLARE using either the FL simulator or POC mode. 

MONAI also allows computing summary data statistics on the datasets defined in the bundle. These can be shared and visualized in FLARE using the federated statistics operators described above. The use of federated statistics and MONAI is included in the GitHub example above.

Federated spleen segmentation in abdominal CT using MONAI bundle from the Model Zoo. For details, visit NVFlare on GitHub.
Figure 5. Federated spleen segmentation in abdominal CT using MONAI bundle from the Model Zoo. For details, visit NVFlare on GitHub.

XGBoost integration

A common request from the federated learning user community is support for more traditional machine learning frameworks in a federated paradigm. FLARE 2.2 provides examples that illustrate horizontal federated learning using two approaches: histogram-based collaboration and tree-based collaboration.  

The community DMLC XGBoost project recently released an adaptation of the existing distributed XGBoost training algorithm that allows federated clients to act as distinct workers in the distributed algorithm. This distributed algorithm is used in a reference implementation of horizontal federated learning that demonstrates the histogram-based approach.

FLARE 2.2 also provides a reference federated implementation of tree-based boosting using two methods: Cyclic Training and Bagging Aggregation. In the Cyclic Training method, multiple sites execute tree boosting on their own local data, forwarding the resulting tree sequence to the next client in the federation for the subsequent round of boosting. In the method of Bagging Aggregation, all sites start from the same global model and boost a number of trees based on their local data. The resulting trees are then aggregated by the server for the next round’s boosting.

Real-world federated learning

The new suite of tools and workflows available in FLARE 2.2 allow developers and data scientists to quickly build applications and more easily bring them to production in a distributed federated learning deployment. When moving to a real-world distributed deployment, there are many considerations for security and privacy that must be addressed by both the project leader and developers, as well as the individual sites participating in the federated learning deployment.

FLARE Dashboard: Streamlined deployment

New in 2.2 is the FLARE Dashboard, designed to simplify project administration and deployment for lead researchers and IT practitioners supporting real-world FL deployments. The FLARE Dashboard allows a project administrator to deploy a website that can be used to define project details, gather information about participant sites, and distribute the startup kits that are used to connect client sites.

The FLARE Dashboard is backed by the same provisioning system in previous versions of the platform and allows users the flexibility to choose either the web UI or the classic command line provisioning, depending on project requirements. Both the Dashboard and provisioning CLI now support dynamic provisioning, allowing project administrators to add federated and admin clients on-demand. This ability to dynamically allocate new training and admin clients without affecting existing clients dramatically simplifies management of the FL system over the lifecycle of the project.

Unified FLARE CLI

The FLARE command-line interface (CLI) has been completely rewritten to consolidate all commands under a common top-level nvflare CLI and introduce new convenience tools for improved usability. 

$ nvflare -h
 
usage: nvflare [-h] [--version] {poc,preflight_check,provision,simulator,dashboard,authz_preview} ...

Subcommands include all of the pre-existing standalone CLI tools like poc, provision, and authz_preview, as well as new commands for launching the FL Simulator and the FLARE Dashboard.  The nvflare command now also includes a preflight_check that provides administrators and end-users a tool to verify system configuration, connectivity to other FLARE subsystems, proper storage configuration, and perform a dry-run connection of the client or server.

Improved site security

The security framework of NVIDIA FLARE has been redesigned in 2.2 to improve both usability and overall security. The roles that are used to define privileges and system operation policies have been streamlined to include: Project Admin, Org Admin, Lead Researcher, and Member Researcher.  The security framework has been strengthened based on these roles, to allow individual organizations and sites to implement their own policies to protect individual privacy and intellectual property (IP) through a Federated Authorization framework.

Federated Authorization shifts both the definition and enforcement of privacy and security policies to individual organizations and member sites, allowing participants to define their own fine-grained site policy:

  • Each organization defines its policy in its own authorization.json configuration
  • This locally defined policy is loaded by FL clients owned by the organization
  • The policy is also enforced by these FL clients

The site policies can be used to control all aspects of the federated learning workflow, including:

  • Resource management: The configuration of system resources that are solely the decisions of local IT
  • Authorization policy: Local authorization policy that determines what a user can or cannot do on the local site
  • Privacy policy: Local policy that specifies what types of studies are allowed and how to add privacy protection to the learning results produced by the FL client on the local site
  • Logging configuration: Each site can now define its own logging configuration for system generated log messages

These site policies also allow individual sites to enforce their own data privacy by defining custom filters and encryption applied to any information passed between the client site and the central server.

This new security framework provides project and organization administrators, researchers, and site IT the tools required to confidently take a federated learning project from proof-of-concept to a real-world deployment.

Getting started with NVIDIA FLARE 2.2

We’ve highlighted just some of the new features in FLARE 2.2 that allow researchers and developers to quickly adopt the platform to prototype and deploy federated learning workflows. Tools like the FL Simulator and FLARE Dashboard for streamlined development and deployment, along with a growing set of reference workflows, make it easier and faster than ever to get started and save valuable development time.

In addition to the examples detailed in this post, FLARE 2.2 includes many other enhancements that increase power and flexibility of the platform, including:

  • Examples for Docker compose and Helm deployment
  • Preflight checks to help identify and correct connectivity and configuration issues
  • Simplified POC commands to test distributed deployments locally
  • Updated example applications

To learn more about these features and get started with the latest examples, visit the NVIDIA FLARE documentation. As we are actively developing the FLARE platform to meet the needs of researchers, data scientists, and platform developers, we welcome any suggestions and feedback in the NVIDIA FLARE GitHub community.  

Join us for the webinar, Federated Learning with NVIDIA Flare: From Simulation to Real World to see an overview of the platform and some of these new features in action.

Categories
Misc

Achieve Innovative Hyperconverged Networking with NVIDIA Spectrum Ethernet and Microsoft Azure Stack HCI

Enterprises of all sizes are increasingly leveraging virtualization and hyperconverged infrastructure (HCI). This technology delivers reliable and secure…

Enterprises of all sizes are increasingly leveraging virtualization and hyperconverged infrastructure (HCI). This technology delivers reliable and secure compute resources for operations while reducing data center footprint. HCI clusters rely on robust, feature-rich networking fabrics to deliver on-premises solutions that can seamlessly connect to the cloud. 

Microsoft Azure Stack HCI is a hyperconverged infrastructure cluster solution that can run containerized applications. It hosts virtualized Windows and Linux workloads and storage in a hybrid environment that combines on-premises infrastructure with Azure cloud services. The server components of Azure Stack HCI can be interconnected using devices that support the appropriate validation requirements.

NVIDIA Spectrum Ethernet switches are purpose-built networking solutions designed to support the requirements of Microsoft Azure Stack HCI. This on-premises solution enables enterprises to leverage cloud functionality, effectively creating a hybrid cloud solution.

Spectrum switches provide end-to-end ethernet for reliable networking with Azure Stack HCI. Spectrum switches also are available in multiple form factors including half-width 10/25/100 Gb/s TORs, two of which can be installed side-by-side in 1 RU (rack unit) space to accommodate the throughput, port density, and high availability required.

The NVIDIA networking team worked closely with the Microsoft networking team to ensure that NVIDIA Spectrum switches meet the physical network requirements for Azure Stack HCI, including:

  • Priority Flow Control
  • Enhanced Transmission Selection
  • Custom TLVs in LLDP transmission

These features are delivered through Cumulus Linux (starting with version 5.1 and continuing in all subsequent releases), the flagship network operating system for NVIDIA Ethernet switches. Cumulus Linux is an open operating system with a “drive it your way” philosophy for management. It comes with a comprehensive data model based CLI known as NVUE (NVIDIA User Experience). But since it is a Linux network operating system, users can interact with Cumulus Linux as a pure Linux system. The flexibility of the configuration methodology allows it to easily integrate with whatever automation toolset you prefer.

“We are pleased to see NVIDIA Spectrum Ethernet switches optimized for Microsoft Azure Stack HCI,” says Tim Isaacs, General Manager at Microsoft. “With the combination of the newly introduced Network HUD feature in the latest Azure Stack HCI release and NVIDIA’s updated Cumulus Linux network operating system for Spectrum Ethernet switches, we can jointly provide our customers rich and robust visibility into their Azure Stack HCI network environment.” 

Finally, NVIDIA worked closely with the Microsoft team to create standardized configurations for the switches to optimize traffic between the different hyperconverged nodes. Through testing a full Azure Stack HCI deployment using NVIDIA Spectrum switches, the configurations were generated to ensure a seamless experience during server deployment. These configurations are available through the Microsoft standard deployment experience.

To get NVIDIA Spectrum switches for your Microsoft Azure Stack HCI deployment, visit the NVIDIA online store or talk to an NVIDIA partner.

Categories
Misc

Upcoming Event: Speech AI Summit 2022

Join experts from Google, Meta, NVIDIA, and more at the first annual NVIDIA Speech AI Summit. Register now!

Join experts from Google, Meta, NVIDIA, and more at the first annual NVIDIA Speech AI Summit. Register now!

Categories
Misc

3D Artist SouthernShotty Creates Wholesome Characters This Week ‘In the NVIDIA Studio’

This week ‘In the NVIDIA Studio,’ we’re highlighting 3D and motion graphics artist SouthernShotty — and scenes from his soon-to-be released short film, Watermelon Girl. 

The post 3D Artist SouthernShotty Creates Wholesome Characters This Week ‘In the NVIDIA Studio’ appeared first on NVIDIA Blog.

Categories
Misc

Upcoming Event: Healthcare & Life Sciences Developer Summit November 10, 2022

A virtual event designed for healthcare developers and startups, this summit on November 10, 2022 offers a full day of technical talks to reach developers and…

A virtual event designed for healthcare developers and startups, this summit on November 10, 2022 offers a full day of technical talks to reach developers and technical leaders in the EMEA region. Get best practices and insights for applications, from biopharma to medical imaging.

Categories
Misc

New Course: Get Started with Highly Accurate Custom ASR for Speech AI

Learn how to build, train, customize, and deploy a GPU-accelerated automatic speech recognition service with NVIDIA Riva in this self-paced course.

Learn how to build, train, customize, and deploy a GPU-accelerated automatic speech recognition service with NVIDIA Riva in this self-paced course.

Categories
Misc

What Are Graph Neural Networks?

When two technologies converge, they can create something new and wonderful — like cellphones and browsers were fused to forge smartphones. Today, developers are applying AI’s ability to find patterns to massive graph databases that store information about relationships among data points of all sorts. Together they produce a powerful new tool called graph neural Read article >

The post What Are Graph Neural Networks? appeared first on NVIDIA Blog.

Categories
Misc

Keep On Trucking: SenSen Harnesses Drones, NVIDIA Jetson, Metropolis to Inspect Trucks

Sensor AI solutions specialist SenSen has turned to the NVIDIA Jetson edge AI platform to help regulators track heavy vehicles moving across Australia. Australia’s National Heavy Vehicle Regulator, or NHVR, has a big job — ensuring the safety of truck drivers across some of the world’s most sparsely populated regions. They’re now harnessing AI to Read article >

The post Keep On Trucking: SenSen Harnesses Drones, NVIDIA Jetson, Metropolis to Inspect Trucks appeared first on NVIDIA Blog.

Categories
Offsites

Google at ECCV 2022

Google is proud to be a Platinum Sponsor of the European Conference on Computer Vision (ECCV 2022), a premier forum for the dissemination of research in computer vision and machine learning (ML). This year, ECCV 2022 will be held as a hybrid event, in person in Tel Aviv, Israel with virtual attendance as an option. Google has a strong presence at this year’s conference with over 60 accepted publications and active involvement in a number of workshops and tutorials. We look forward to sharing some of our extensive research and expanding our partnership with the broader ML research community.

Registered for ECCV 2022? We hope you’ll visit our on-site or virtual booths to learn more about the research we’re presenting at ECCV 2022, including several demos and opportunities to connect with our researchers. Learn more about Google’s research being presented at ECCV 2022 below (Google affiliations in bold).

Organizing Committee

Program Chairs include: Moustapha Cissé

Awards Paper Committee: Todd Zickler

Area Chairs include: Ayan Chakrabarti, Tali Dekel, Alireza Fathi, Vittorio Ferrari, David Fleet, Dilip Krishnan, Michael Rubinstein, Cordelia Schmid, Deqing Sun, Federico Tombari, Jasper Uijlings, Ming-Hsuan Yang, Todd Zickler

Accepted Publications

NeuMesh: Learning Disentangled Neural Mesh-Based Implicit Field for Geometry and Texture Editing
Bangbang Yang, Chong Bao, Junyi Zeng, Hujun Bao, Yinda Zhang, Zhaopeng Cui, Guofeng Zhang

Anti-Neuron Watermarking: Protecting Personal Data Against Unauthorized Neural Networks
Zihang Zou, Boqing Gong, Liqiang Wang

Exploiting Unlabeled Data with Vision and Language Models for Object Detection
Shiyu Zhao, Zhixing Zhang, Samuel Schulter, Long Zhao, Vijay Kumar B G, Anastasis Stathopoulos, Manmohan Chandraker, Dimitris N. Metaxas

Waymo Open Dataset: Panoramic Video Panoptic Segmentation
Jieru Mei, Alex Zhu, Xinchen Yan, Hang Yan, Siyuan Qiao, Yukun Zhu, Liang-Chieh Chen, Henrik Kretzschmar

PRIF: Primary Ray-Based Implicit Function
Brandon Yushan Feng, Yinda Zhang, Danhang Tang, Ruofei Du, Amitabh Varshney

LoRD: Local 4D Implicit Representation for High-Fidelity Dynamic Human Modeling
Boyan Jiang, Xinlin Ren, Mingsong Dou, Xiangyang Xue, Yanwei Fu, Yinda Zhang

k-Means Mask Transformer (see blog post)
Qihang Yu*, Siyuan Qiao, Maxwell D Collins, Yukun Zhu, Hartwig Adam, Alan Yuille, Liang-Chieh Chen

MaxViT: Multi-Axis Vision Transformer (see blog post)
Zhengzhong Tu, Hossein Talebi, Han Zhang, Feng Yang, Peyman Milanfar, Alan Bovik, Yinxiao Li

E-Graph: Minimal Solution for Rigid Rotation with Extensibility Graphs
Yanyan Li, Federico Tombari

RBP-Pose: Residual Bounding Box Projection for Category-Level Pose Estimation
Ruida Zhang, Yan Di, Zhiqiang Lou, Fabian Manhardt, Federico Tombari, Xiangyang Ji

GOCA: Guided Online Cluster Assignment for Self-Supervised Video Representation Learning
Huseyin Coskun, Alireza Zareian, Joshua L Moore, Federico Tombari, Chen Wang

Scaling Open-Vocabulary Image Segmentation with Image-Level Labels
Golnaz Ghiasi, Xiuye Gu, Yin Cui, Tsung-Yi Lin*

Adaptive Transformers for Robust Few-Shot Cross-Domain Face Anti-spoofing
Hsin-Ping Huang, Deqing Sun, Yaojie Liu, Wen-Sheng Chu, Taihong Xiao, Jinwei Yuan, Hartwig Adam, Ming-Hsuan Yang

DualPrompt: Complementary Prompting for Rehearsal-Free Continual Learning
Zifeng Wang*, Zizhao Zhang, Sayna Ebrahimi, Ruoxi Sun, Han Zhang, Chen-Yu Lee, Xiaoqi Ren, Guolong Su, Vincent Perot, Jennifer Dy, Tomas Pfister

BLT: Bidirectional Layout Transformer for Controllable Layout Generation
Xiang Kong, Lu Jiang, Huiwen Chang, Han Zhang, Yuan Hao, Haifeng Gong, Irfan Essa

V2X-ViT: Vehicle-to-Everything Cooperative Perception with Vision Transformer
Runsheng Xu, Hao Xiang, Zhengzhong Tu, Xin Xia, Ming-Hsuan Yang, Jiaqi Ma

Learning Visibility for Robust Dense Human Body Estimation
Chun-Han Yao, Jimei Yang, Duygu Ceylan, Yi Zhou, Yang Zhou, Ming-Hsuan Yang

Are Vision Transformers Robust to Patch Perturbations?
Jindong Gu, Volker Tresp, Yao Qin

PseudoAugment: Learning to Use Unlabeled Data for Data Augmentation in Point Clouds
Zhaoqi Leng, Shuyang Cheng, Ben Caine, Weiyue Wang, Xiao Zhang, Jonathon Shlens, Mingxing Tan, Dragomir Anguelov

Structure and Motion from Casual Videos
Zhoutong Zhang, Forrester Cole, Zhengqi Li, Noah Snavely, Michael Rubinstein, William T. Freeman

PreTraM: Self-Supervised Pre-training via Connecting Trajectory and Map
Chenfeng Xu, Tian Li, Chen Tang, Lingfeng Sun, Kurt Keutzer, Masayoshi Tomizuka, Alireza Fathi, Wei Zhan

Novel Class Discovery Without Forgetting
Joseph K J, Sujoy Paul, Gaurav Aggarwal, Soma Biswas, Piyush Rai, Kai Han, Vineeth N Balasubramanian

Hierarchically Self-Supervised Transformer for Human Skeleton Representation Learning
Yuxiao Chen, Long Zhao, Jianbo Yuan, Yu Tian, Zhaoyang Xia, Shijie Geng, Ligong Han, Dimitris N. Metaxas

PACTran: PAC-Bayesian Metrics for Estimating the Transferability of Pretrained Models to Classification Tasks
Nan Ding, Xi Chen, Tomer Levinboim, Soravit Changpinyo, Radu Soricut

InfiniteNature-Zero: Learning Perpetual View Generation of Natural Scenes from Single Images
Zhengqi Li, Qianqian Wang*, Noah Snavely, Angjoo Kanazawa*

Generalizable Patch-Based Neural Rendering (see blog post)
Mohammed Suhail*, Carlos Esteves, Leonid Sigal, Ameesh Makadia

LESS: Label-Efficient Semantic Segmentation for LiDAR Point Clouds
Minghua Liu, Yin Zhou, Charles R. Qi, Boqing Gong, Hao Su, Dragomir Anguelov

The Missing Link: Finding Label Relations Across Datasets
Jasper Uijlings, Thomas Mensink, Vittorio Ferrari

Learning Instance-Specific Adaptation for Cross-Domain Segmentation
Yuliang Zou, Zizhao Zhang, Chun-Liang Li, Han Zhang, Tomas Pfister, Jia-Bin Huang

Learning Audio-Video Modalities from Image Captions
Arsha Nagrani, Paul Hongsuck Seo, Bryan Seybold, Anja Hauth, Santiago Manen, Chen Sun, Cordelia Schmid

TL;DW? Summarizing Instructional Videos with Task Relevance & Cross-Modal Saliency
Medhini Narasimhan*, Arsha Nagrani, Chen Sun, Michael Rubinstein, Trevor Darrell, Anna Rohrbach, Cordelia Schmid

On Label Granularity and Object Localization
Elijah Cole, Kimberly Wilber, Grant Van Horn, Xuan Yang, Marco Fornoni, Pietro Perona, Serge Belongie, Andrew Howard, Oisin Mac Aodha

Disentangling Architecture and Training for Optical Flow
Deqing Sun, Charles Herrmann, Fitsum Reda, Michael Rubinstein, David J. Fleet, William T. Freeman

NewsStories: Illustrating Articles with Visual Summaries
Reuben Tan, Bryan Plummer, Kate Saenko, J.P. Lewis, Avneesh Sud, Thomas Leung

Improving GANs for Long-Tailed Data Through Group Spectral Regularization
Harsh Rangwani, Naman Jaswani, Tejan Karmali, Varun Jampani, Venkatesh Babu Radhakrishnan

Planes vs. Chairs: Category-Guided 3D Shape Learning Without Any 3D Cues
Zixuan Huang, Stefan Stojanov, Anh Thai, Varun Jampani, James Rehg

A Sketch Is Worth a Thousand Words: Image Retrieval with Text and Sketch
Patsorn Sangkloy, Wittawat Jitkrittum, Diyi Yang, James Hays

Learned Monocular Depth Priors in Visual-Inertial Initialization
Yunwen Zhou, Abhishek Kar, Eric L. Turner, Adarsh Kowdle, Chao Guo, Ryan DuToit, Konstantine Tsotsos

How Stable are Transferability Metrics Evaluations?
Andrea Agostinelli, Michal Pandy, Jasper Uijlings, Thomas Mensink, Vittorio Ferrari

Data-Free Neural Architecture Search via Recursive Label Calibration
Zechun Liu*, Zhiqiang Shen, Yun Long, Eric Xing, Kwang-Ting Cheng, Chas H. Leichner

Fast and High Quality Image Denoising via Malleable Convolution
Yifan Jiang*, Bartlomiej Wronski, Ben Mildenhall, Jonathan T. Barron, Zhangyang Wang, Tianfan Xue

Concurrent Subsidiary Supervision for Unsupervised Source-Free Domain Adaptation
Jogendra Nath Kundu, Suvaansh Bhambri, Akshay R Kulkarni, Hiran Sarkar,
Varun Jampani, Venkatesh Babu Radhakrishnan

Learning Online Multi-Sensor Depth Fusion
Erik Sandström, Martin R. Oswald, Suryansh Kumar, Silvan Weder, Fisher Yu, Cristian Sminchisescu, Luc Van Gool

Hierarchical Semantic Regularization of Latent Spaces in StyleGANs
Tejan Karmali, Rishubh Parihar, Susmit Agrawal, Harsh Rangwani, Varun Jampani, Maneesh K Singh, Venkatesh Babu Radhakrishnan

RayTran: 3D Pose Estimation and Shape Reconstruction of Multiple Objects from Videos with Ray-Traced Transformers
Michał J Tyszkiewicz, Kevis-Kokitsi Maninis, Stefan Popov, Vittorio Ferrari

Neural Video Compression Using GANs for Detail Synthesis and Propagation
Fabian Mentzer, Eirikur Agustsson, Johannes Ballé, David Minnen, Nick Johnston, George Toderici

Exploring Fine-Grained Audiovisual Categorization with the SSW60 Dataset
Grant Van Horn, Rui Qian, Kimberly Wilber, Hartwig Adam, Oisin Mac Aodha, Serge Belongie

Implicit Neural Representations for Image Compression
Yannick Strümpler, Janis Postels, Ren Yang, Luc Van Gool, Federico Tombari

3D Compositional Zero-Shot Learning with DeCompositional Consensus
Muhammad Ferjad Naeem, Evin Pınar Örnek, Yongqin Xian, Luc Van Gool, Federico Tombari

FindIt: Generalized Localization with Natural Language Queries (see blog post)
Weicheng Kuo, Fred Bertsch, Wei Li, AJ Piergiovanni, Mohammad Saffar, Anelia Angelova

A Simple Single-Scale Vision Transformer for Object Detection and Instance Segmentation
Wuyang Chen*, Xianzhi Du, Fan Yang, Lucas Beyer, Xiaohua Zhai, Tsung-Yi Lin, Huizhong Chen, Jing Li, Xiaodan Song, Zhangyang Wang, Denny Zhou

Improved Masked Image Generation with Token-Critic
Jose Lezama, Huiwen Chang, Lu Jiang, Irfan Essa

Learning Discriminative Shrinkage Deep Networks for Image Deconvolution
Pin-Hung Kuo, Jinshan Pan, Shao-Yi Chien, Ming-Hsuan Yang

AudioScopeV2: Audio-Visual Attention Architectures for Calibrated Open-Domain On-Screen Sound Separation
Efthymios Tzinis*, Scott Wisdom, Tal Remez, John Hershey

Simple Open-Vocabulary Object Detection with Vision Transformers
Matthias Minderer, Alexey Gritsenko, Austin C Stone, Maxim Neumann, Dirk Weißenborn, Alexey Dosovitskiy, Aravindh Mahendran, Anurag Arnab, Mostafa Dehghani, Zhuoran Shen, Xiao Wang, Xiaohua Zhai, Thomas Kipf, Neil Houlsby

COMPOSER: Compositional Reasoning of Group Activity in Videos with Keypoint-Only Modality
Honglu Zhou, Asim Kadav, Aviv Shamsian, Shijie Geng, Farley Lai, Long Zhao, Ting Liu, Mubbasir Kapadia, Hans Peter Graf

Video Question Answering with Iterative Video-Text Co-tokenization (see blog post)
AJ Piergiovanni, Kairo Morton*, Weicheng Kuo, Michael S. Ryoo, Anelia Angelova

Class-Agnostic Object Detection with Multi-modal Transformer
Muhammad Maaz, Hanoona Abdul Rasheed, Salman Khan, Fahad Shahbaz Khan, Rao Muhammad Anwer, Ming-Hsuan Yang

FILM: Frame Interpolation for Large Motion (see blog post)
Fitsum Reda, Janne Kontkanen, Eric Tabellion, Deqing Sun, Caroline Pantofaru, Brian Curless

Compositional Human-Scene Interaction Synthesis with Semantic Control
Kaifeng Zhao, Shaofei Wang, Yan Zhang, Thabo Beeler, Siyu Tang

Workshops

LatinX in AI
Mentors include: José Lezama
Keynote Speakers include: Andre Araujo

AI for Creative Video Editing and Understanding
Keynote Speakers include: Tali Dekel, Negar Rostamzadeh

Learning With Limited and Imperfect Data (L2ID)
Invited Speakers include: Xiuye Gu
Organizing Committee includes: Sadeep Jayasumana

International Challenge on Compositional and Multimodal Perception (CAMP)
Program Committee includes: Edward Vendrow

Self-Supervised Learning: What is Next?
Invited Speakers include: Mathilde Caron, Arsha Nagrani
Organizers include: Andrew Zisserman

3rd Workshop on Adversarial Robustness In the Real World
Invited Speakers include: Ekin Dogus Cubuk
Organizers include: Xinyun Chen, Alexander Robey, Nataniel Ruiz, Yutong Bai

AV4D: Visual Learning of Sounds in Spaces
Invited Speakers include: John Hershey

Challenge on Mobile Intelligent Photography and Imaging (MIPI)
Invited Speakers include: Peyman Milanfar

Robust Vision Challenge 2022
Organizing Committee includes: Alina Kuznetsova

Computer Vision in the Wild
Challenge Organizers include: Yi-Ting Chen, Ye Xia
Invited Speakers include: Yin Cui, Yongqin Xian, Neil Houlsby

Self-Supervised Learning for Next-Generation Industry-Level Autonomous Driving (SSLAD)
Organizers include: Fisher Yu

Responsible Computer Vision
Organizing Committee includes: Been Kim
Invited Speakers include: Emily Denton

Cross-Modal Human-Robot Interaction
Invited Speakers include: Peter Anderson

ISIC Skin Image Analysis
Organizing Committee includes: Yuan Liu
Steering Committee includes: Yuan Liu, Dale Webster
Invited Speakers include: Yuan Liu

Observing and Understanding Hands in Action
Sponsored by Google

Autonomous Vehicle Vision (AVVision)
Speakers include: Fisher Yu

Visual Perception for Navigation in Human Environments: The JackRabbot Human Body Pose Dataset and Benchmark
Organizers include: Edward Vendrow

Language for 3D Scenes
Invited Speakers include: Jason Baldridge
Organizers include: Leonidas Guibas

Designing and Evaluating Computer Perception Systems (CoPe)
Organizers include: Andrew Zisserman

Learning To Generate 3D Shapes and Scenes
Panelists include: Pete Florence

Advances in Image Manipulation
Program Committee includes: George Toderici, Ming-Hsuan Yang

TiE: Text in Everything
Challenge Organizers include: Shangbang Long, Siyang Qin
Invited Speakers include: Tali Dekel, Aishwarya Agrawal

Instance-Level Recognition
Organizing Committee: Andre Araujo, Bingyi Cao, Tobias Weyand
Invited Speakers include: Mathilde Caron

What Is Motion For?
Organizing Committee: Deqing Sun, Fitsum Reda, Charles Herrmann
Invited Speakers include: Tali Dekel

Neural Geometry and Rendering: Advances and the Common Objects in 3D Challenge
Invited Speakers include: Ben Mildenhall

Visual Object-Oriented Learning Meets Interaction: Discovery, Representations, and Applications
Invited Speakers include: Klaus Greff, Thomas Kipf
Organizing Committee includes: Leonidas Guibas

Vision with Biased or Scarce Data (VBSD)
Program Committee includes: Yizhou Wang

Multiple Object Tracking and Segmentation in Complex Environments
Invited Speakers include: Xingyi Zhou, Fisher Yu

3rd Visual Inductive Priors for Data-Efficient Deep Learning Workshop
Organizing Committee includes: Ekin Dogus Cubuk

DeeperAction: Detailed Video Action Understanding and Anomaly Recognition
Advisors include: Rahul Sukthankar

Sign Language Understanding Workshop and Sign Language Recognition, Translation & Production Challenge
Organizing Committee includes: Andrew Zisserman
Speakers include: Andrew Zisserman

Ego4D: First-Person Multi-Modal Video Understanding
Invited Speakers include: Michal Irani

AI-Enabled Medical Image Analysis: Digital Pathology & Radiology/COVID19
Program Chairs include: Po-Hsuan Cameron Chen
Workshop Partner: Google Health

Visual Object Tracking Challenge (VOT 2022)
Technical Committee includes: Christoph Mayer

Assistive Computer Vision and Robotics
Technical Committee includes: Maja Mataric

Human Body, Hands, and Activities from Egocentric and Multi-View Cameras
Organizers include: Francis Engelmann

Frontiers of Monocular 3D Perception: Implicit x Explicit
Panelists include: Pete Florence

Tutorials

Self-Supervised Representation Learning in Computer Vision
Invited Speakers include: Ting Chen

Neural Volumetric Rendering for Computer Vision
Organizers include: Ben Mildenhall, Pratul Srinivasan, Jon Barron
Presenters include: Ben Mildenhall, Pratul Srinivasan

New Frontiers in Efficient Neural Architecture Search!
Speakers include: Ruochen Wang



*Work done while at Google.