Categories
Misc

Upcoming Webinar: How to Maintain and Optimize Edge Computing Deployments

Join us on August 11, 2022 to learn how to design edge deployments for future-proof scale, best practices for optimizing multiple deployments on edge systems,…

Join us on August 11, 2022 to learn how to design edge deployments for future-proof scale, best practices for optimizing multiple deployments on edge systems, and tips for remotely repairing systems and applications.

Categories
Misc

How to Start a Career in AI

How do I start a career as a deep learning engineer? What are some of the key tools and frameworks used in AI? How do I learn more about ethics in AI? Everyone has questions, but the most common questions in AI always return to this: how do I get involved? Cutting through the hype Read article >

The post How to Start a Career in AI appeared first on NVIDIA Blog.

Categories
Misc

NVIDIA Announces Preliminary Financial Results for Second Quarter Fiscal 2023

NVIDIA today announced selected preliminary financial results for the second quarter ended July 31, 2022.

Categories
Misc

Predicting How Images Influence Visual Reaction Speed

Imagine driving along a road and an obstacle suddenly appears in your path. How quickly can you react to it? How does your reaction speed change with the time…

Imagine driving along a road and an obstacle suddenly appears in your path. How quickly can you react to it? How does your reaction speed change with the time of day, the color of the obstacle, and where it appears in your field of view?

The ability to react quickly to visual events is valuable to everyday life. It is also a fundamental skill in fast-paced video games. A recent collaboration between researchers from NVIDIA, NYU, and Princeton—winner of a SIGGRAPH 2022 Technical Paper Award—explores the relationship between image features and the time it takes for an observer to react.

Graphic illustrating how low-contrast features slow down reaction speed and high-contrast ones speed it up.
Figure 1. Human visual reaction speed varies with the visual characteristics of the target. This example shows how low-contrast features (center top) slow down reaction speed and high-contrast ones (center bottom) speed it up.  

Reaction speed and visual events

With so many recent advances in display technology, human reaction times have become a primary bottleneck in the graphics pipeline. Response times for communicating with remote servers, rendering and displaying images, and collecting and processing mouse or keyboard input are all typically tens of milliseconds or less.

By contrast, the pipeline for human perception is much slower, and can range from 100 to 500 milliseconds depending on the complexity of the visual input. This research aims to simplify and optimize images to reduce our reaction time as much as possible.

Visual contrast and spatial frequency are well-known features that influence low-level vision. Further, human vision is not uniform over the entire field of view. The amount of contrast needed to boost reaction time varies depending on eccentricity, or visual angle (where an object is located relative to center gaze) and spatial frequency (whether an object is a solid color or a complex pattern, for example). Reaction time is a combination of many neural processes, and the proposed model includes all of these factors.

Reaction time measurements are based on the onset latency of voluntary rapid eye movements called saccades. The “reaction time clock” starts ticking as soon as the target appears on the screen. Once the target is identified, a saccade is initiated towards it.

Graphic showing yhree visual characteristics that influence saccadic reaction time: contrast (left), frequency (center), and eccentricity (right)
Figure 2. Three visual characteristics that influence saccadic reaction time: contrast (left), frequency (center), and eccentricity (right)

Modeling saccadic reaction

To build a perceptually accurate model for reaction time prediction, researchers conducted a series of experiments with human observers, collecting over 11,000 reaction times for varying image features. 

Inspired by how the human brain perceives information and makes decisions, the researchers designed a model for reaction time prediction, accounting for contrast, frequency, and eccentricity, as well as the inherent randomness in human reaction speed. 

In this model, a measure of “decision confidence” is accumulated over time, and once enough confidence has been accumulated, a saccade is made. The rate at which confidence accumulates over time is inconsistent, as shown in the video below.

Video 1. Human eyes take time to accumulate incoming photons of light until reaching a level sufficient for making a decision, and then invoke a saccade. This makes saccadic reaction timings inherently random due to noise in visual processing.

Hence, instead of predicting a single reaction time with full certainty, the model provides a likelihood of exhibiting various reaction times. The average rate of confidence accumulation is influenced by image features and results in a change in the likelihood of reaction times, as shown in the video below. 

Video 2. For a visual object of known contrast, frequency, and eccentricity, the model predicts a random distribution of likely reaction times

Two validation experiments confirm that this model can be applied to images that might be seen, including video games and natural photographs.

The proposed model accurately predicts human reaction times for varying visual conditions such as a soccer game (left), a shooter game (center), and a natural photograph (right). Shooter game assets courtesy of Counter-Strike: Global Offensive, Valve Corporation.
Figure 3. The proposed model accurately predicts human reaction times for varying visual conditions such as a soccer game (left), a shooter game (center), and a natural photograph (right). Shooter game assets courtesy of Counter-Strike: Global Offensive, Valve Corporation.

Using reaction time prediction to optimize human performance

Applications for this saccadic reaction time model include, for example, a smart drive-assist system estimating whether a driver can safely react to pedestrians and other vehicles, and turn on appropriate assistance features. Similarly, e-sports game designers can use this model to understand the fairness of their game’s visual design, avoiding bias in competitive outcomes. 

Ambitious gamers can also use this model to fine-tune their setup for maximum performance–by choosing an optimal skin for the target 3D object, for example.

In future work, the research team plans to explore how other image features like color and temporal effects influence human reaction time, and how to train humans to increase the speed at which they react to on-screen or real-world events.

For more details, read the paper, Image Features Influence Reaction Time: A Learned Probabilistic Perceptual Model for Saccade Latency. You can also visit the gaze-timing project on GitHub.   

The paper’s authors, Budmonde Duinkharjav, Praneeth Chakravarthula, Rachel Brown, Anjul Patney, and Qi Sun will present this work at SIGGRAPH 2022 on August 11 in Vancouver, British Columbia.

Categories
Misc

Upcoming Webinar: Detecting Cyber Threats with Unsupervised Learning 

Discover how to detect cyber threats using machine learning and NVIDIA Morpheus, an open-source AI framework.

Discover how to detect cyber threats using machine learning and NVIDIA Morpheus, an open-source AI framework.

Categories
Misc

NVIDIA Instant NeRF Wins Best Paper at SIGGRAPH, Inspires Creative Wave Amid Tens of Thousands of Downloads

3D content creators are clamoring for NVIDIA Instant NeRF, an inverse rendering tool that turns a set of static images into a realistic 3D scene. Since its debut earlier this year, tens of thousands of developers around the world have downloaded the source code and used it to render spectacular scenes, sharing eye-catching results on Read article >

The post NVIDIA Instant NeRF Wins Best Paper at SIGGRAPH, Inspires Creative Wave Amid Tens of Thousands of Downloads appeared first on NVIDIA Blog.

Categories
Misc

Building an Active Digital Twin Using NVIDIA Omniverse and Project Gemini

The Acceleration Agency, a digital innovation and product design firm, is working on an active digital twin framework and toolkit called Project Gemini….

The Acceleration Agency, a digital innovation and product design firm, is working on an active digital twin framework and toolkit called Project Gemini. Inspired by the United States space program of the same name, Project Gemini uses active sensor fabric data and a wide range of data from sources like Google Sheets and Customer Relationship Management (CRM) platforms to replicate real-world settings in the virtual world. 

The active digital twin framework and toolkit will be fully connected to NVIDIA Omniverse–a scalable platform for design and collaboration–using Universal Scene Description (USD).

The project launched with a digital replication of The Acceleration Agency’s main office located in Austin, Texas. Instrumented with a dense sensor fabric for real-time and historical spatial computation, the digital twin of the office includes employees and employee information (job title, ID#, gender, and date of birth) provided by Salesforce. It also tracks inventory items on site and can display information such as quantity, date of last interaction, temperature, and orientation.

With NVIDIA Omniverse real time, true-to-reality physics from PhysX, and physically accurate RTX rendering capabilities, the  team anticipates that the Gemini active digital twin can be simulated with an unprecedented level of visual and physical fidelity and with complex simulations. 

Leveraging USD and Omniverse Nucleus, users of the Project Gemini digital twin platform will be able to update content in a variety of tools in real time collaboratively instead of having to wait for new builds.

Connecting Google Sheets to NVIDIA Omniverse with a Kit Extension

Multiple abstraction layers and a sensor fabric layer allow a variety of sensors, databases, CRMs and object integration tools to connect to Omniverse. The connection allows real-time updates to inventory objects and information like temperature, humidity, and location.

To accomplish this, the team created a simple Omniverse Kit Extension enabled by a Python script that reads data from a Google Sheet and attaches the data to an object in Omniverse Kit. It allows someone to control the location, scale, and rotation of any selected object in Omniverse applications like Omniverse Code or Omniverse Create using the metadata in the spreadsheet. You can access the AccelerationAgency/omniverse-extensions through GitHub. 

Using database and CRM tools with the extension makes the task of manipulating object data more scalable. When building digital twins at the scale of factories, stadiums, warehouses, and even cities, hundreds, thousands, and even millions of objects may need to be manipulated rapidly.

The Acceleration Agency loaded the USD version of their office digital twin into the Omniverse stage and used the extension to select and manipulate object data. 

The images below show an example of how this process was done for a Tesla in the parking lot outside the agency office. Building this was fairly straightforward and only took a few days for a single developer to create. It can be extended to any data source.

Google Sheet with object location, scale, and rotation information
Figure 1. Google Sheet with object location, scale, and rotation information
Selecting the Project Gemini-enabled extension from the extensions tab in Omniverse Code
Figure 2. Selecting the Project Gemini-enabled extension from the extensions tab in Omniverse Code
The object before running the extension to pull in the data from the Google Sheet
Figure 3. The object before running the extension to pull in the data from the Google Sheet

After running the extension to pull in the data from the Google Sheet, the object now has different parameters
Figure 4. After running the extension to pull in the data from the Google Sheet, the object now has different parameters

Running the extension using the USD version of the office digital twin as the data source, then selecting the Tesla as the data object to manipulate
Figure 5. Running the extension using the USD version of the office digital twin as the data source, then selecting the Tesla as the data object to manipulate
 Tripling the scale factors of the Tesla in the Google Sheet updates through the extension and then propagates into the stage
Figure 6. Tripling the scale factors of the Tesla in the Google Sheet updates through the extension and then propagates into the stage

Watch the extension in action with Starr Long, Executive Producer at The Acceleration Agency: 

Adding RTX Renderer and Nucleus Collaboration

The next step for Project Gemini is to render in real time with the NVIDIA RTX Renderer and allow for real-time modifications through Nucleus. The real-time modifications are one of the advantages of working with the powerful USD 3D framework and composition engine. This will be coupled with historical recordings of real data which when played back can be mixed with these modifications to try different scenarios. Some of the use cases the team is targeting include construction sites, hospitals, and live event venues. To learn more, visit the Project Gemini website

Digital twin of The Acceleration Agency office running in the NVIDIA RTX Renderer
Figure 7. Digital twin of The Acceleration Agency office running in the NVIDIA RTX Renderer

Sensors and tags that send real-time data about location, temperature, and other factors to the digital twin
Figure 8. Sensors and tags that send real-time data about location, temperature, and other factors to the digital twin

Learn more about building custom USD-based applications and extensions for NVIDIA Omniverse in the Omniverse Resource Center and with these USD-specific resources

Don’t miss NVIDIA at SIGGRAPH, August 8-11, 2022. Watch the Omniverse community livestream at SIGGRAPH on August 9 at noon, Pacific time, to learn how NVIDIA Omniverse and other design and visualization solutions are driving breakthroughs in graphics and GPU-accelerated software.

You’re also invited to enter the inaugural #ExtendOmniverse developer contest, open through August 19, 2022. Create an Omniverse Extension using Omniverse Code for a chance to win an NVIDIA RTX GPU.

Follow NVIDIA Omniverse on Instagram, Twitter, YouTube and Medium for additional resources and inspiration. Check out the Omniverse forums, and join our Discord server and Twitch channel to chat with the community.

Categories
Misc

Top Edge AI Sessions at GTC 2022

Join us September 19-22 for a deep dive into the latest advances in edge AI, from reimagined shopping experiences to industrial automation.

Join us September 19-22 for a deep dive into the latest advances in edge AI, from reimagined shopping experiences to industrial automation.

Categories
Misc

Dive Into AI, Avatars and the Metaverse With NVIDIA at SIGGRAPH

Innovative technologies in AI, virtual worlds and digital humans are shaping the future of design and content creation across every industry. Experience the latest advances from NVIDIA in all these areas at SIGGRAPH, the world’s largest gathering of computer graphics experts, running Aug. 8-11. At the conference, creators, developers, engineers, researchers and students will see Read article >

The post Dive Into AI, Avatars and the Metaverse With NVIDIA at SIGGRAPH appeared first on NVIDIA Blog.

Categories
Offsites

Introducing the Google Universal Image Embedding Challenge

Computer vision models see daily application for a wide variety of tasks, ranging from object recognition to image-based 3D object reconstruction. One challenging type of computer vision problem is instance-level recognition (ILR) — given an image of an object, the task is to not only determine the generic category of an object (e.g., an arch), but also the specific instance of the object (”Arc de Triomphe de l’Étoile, Paris, France”).

Previously, ILR was tackled using deep learning approaches. First, a large set of images was collected. Then a deep model was trained to embed each image into a high-dimensional space where similar images have similar representations. Finally, the representation was used to solve the ILR tasks related to classification (e.g., with a shallow classifier trained on top of the embedding) or retrieval (e.g., with a nearest neighbor search in the embedding space).

Since there are many different object domains in the world, e.g., landmarks, products, or artworks, capturing all of them in a single dataset and training a model that can distinguish between them is quite a challenging task. To decrease the complexity of the problem to a manageable level, the focus of research so far has been to solve ILR for a single domain at a time. To advance the research in this area, we hosted multiple Kaggle competitions focused on the recognition and retrieval of landmark images. In 2020, Amazon joined the effort and we moved beyond the landmark domain and expanded to the domains of artwork and product instance recognition. The next step is to generalize the ILR task to multiple domains.

To this end, we’re excited to announce the Google Universal Image Embedding Challenge, hosted by Kaggle in collaboration with Google Research and Google Lens. In this challenge, we ask participants to build a single universal image embedding model capable of representing objects from multiple domains at the instance level. We believe that this is the key for real-world visual search applications, such as augmenting cultural exhibits in a museum, organizing photo collections, visual commerce and more.

Images1 of object instances coming from multiple domains, which are represented in our dataset: apparel and accessories, packaged goods, furniture and home goods, toys, cars, landmarks, storefronts, dishes, artwork, memes and illustrations.

Degrees of Variation in Different Domains
To represent objects from a large number of domains, we require one model to learn many domain-specific subtasks (e.g., filtering different kinds of noise or focusing on a specific detail), which can only be learned from a semantically and visually diverse collection of images. Addressing each degree of variation proposes a new challenge for both image collection and model training.

The first sort of variation comes from the fact that while some domains contain unique objects in the world (landmarks, artwork, etc.), others contain objects that may have many copies (clothing, furniture, packaged goods, food, etc.). Because a landmark is always placed at the same location, the surrounding context may be useful for recognition. In contrast, a product, say a phone, even of a specific model and color, may have millions of physical instances and thus appear in many surrounding contexts.

Another challenge comes from the fact that a single object may appear different depending on the point of view, lighting conditions, occlusion or deformations (e.g., a dress worn on a person may look very different than on a hanger). In order for a model to learn invariance to all of these visual modes, all of them should be captured by the training data.

Additionally, similarities between objects differ across domains. For example, in order for a representation to be useful in the product domain, it must be able to distinguish very fine-grained details between similarly looking products belonging to two different brands. In the domain of food, however, the same dish (e.g., spaghetti bolognese) cooked by two chefs may look quite different, but the ability of the model to distinguish spaghetti bolognese from other dishes may be sufficient for the model to be useful. Additionally, a vision model of high quality should assign similar representations to more visually similar renditions of a dish.

<!–

–><!–

–>

Domain    Landmark    Apparel
Image      
Instance Name    Empire State Building2    Cycling jerseys with Android logo3
Which physical objects belong to the instance class?    Single instance in the world    Many physical instances; may differ in size or pattern (e.g., a patterned cloth cut differently)
What are the possible views of the object?    Appearance variation only based on capture conditions (e.g., illumination or viewpoint); limited number of common external views; possibility of many internal views    Deformable appearance (e.g., worn or not); limited number of common views: front, back, side
What are the surroundings and are they useful for recognition?    Surrounding context does not vary much other than daily and yearly cycles; may be useful for verifying the object of interest    Surrounding context can change dramatically due to difference in environment, additional pieces of clothing, or accessories partially occluding clothing of interest (e.g., a jacket or a scarf)
What may be tricky cases that do not belong to the instance class?    Replicas of landmarks (e.g., Eiffel Tower in Las Vegas), souvenirs    Same piece of apparel of different material or different color; visually very similar pieces with a small distinguishing detail (e.g., a small brand logo); different pieces of apparel worn by the same model
Variation among domains for landmark and apparel examples.

Learning Multi-domain Representations
After a collection of images covering a variety of domains is created, the next challenge is to train a single, universal model. Some features and tasks, such as representing color, are useful across many domains, and thus adding training data from any domain will likely help the model improve at distinguishing colors. Other features may be more specific to selected domains, thus adding more training data from other domains may deteriorate the model’s performance. For example, while for 2D artwork it may be very useful for the model to learn to find near duplicates, this may deteriorate the performance on clothing, where deformed and occluded instances need to be recognized.

The large variety of possible input objects and tasks that need to be learned require novel approaches for selecting, augmenting, cleaning and weighing the training data. New approaches for model training and tuning, and even novel architectures may be required.

Universal Image Embedding Challenge
To help motivate the research community to address these challenges, we are hosting the Google Universal Image Embedding Challenge. The challenge was launched on Kaggle in July and will be open until October, with cash prizes totaling $50k. The winning teams will be invited to present their methods at the Instance-Level Recognition workshop at ECCV 2022.

Participants will be evaluated on a retrieval task on a dataset of ~5,000 test query images and ~200,000 index images, from which similar images are retrieved. In contrast to ImageNet, which includes categorical labels, the images in this dataset are labeled at the instance level.

The evaluation data for the challenge is composed of images from the following domains: apparel and accessories, packaged goods, furniture and home goods, toys, cars, landmarks, storefronts, dishes, artwork, memes and illustrations.

Distribution of domains of query images.

We invite researchers and machine learning enthusiasts to participate in the Google Universal Image Embedding Challenge and join the Instance-Level Recognition workshop at ECCV 2022. We hope the challenge and the workshop will advance state-of-the-art techniques on multi-domain representations.

Acknowledgement
The core contributors to this project are Andre Araujo, Boris Bluntschli, Bingyi Cao, Kaifeng Chen, Mário Lipovský, Grzegorz Makosa, Mojtaba Seyedhosseini and Pelin Dogan Schönberger. We would like to thank Sohier Dane, Will Cukierski and Maggie Demkin for their help organizing the Kaggle challenge, as well as our ECCV workshop co-organizers Tobias Weyand, Bohyung Han, Shih-Fu Chang, Ondrej Chum, Torsten Sattler, Giorgos Tolias, Xu Zhang, Noa Garcia, Guangxing Han, Pradeep Natarajan and Sanqiang Zhao. Furthermore we are thankful to Igor Bonaci, Tom Duerig, Vittorio Ferrari, Victor Gomes, Futang Peng and Howard Zhou who gave us feedback, ideas and support at various points of this project.


1 Image credits: Chris Schrier, CC-BY; Petri Krohn, GNU Free Documentation License; Drazen Nesic, CC0; Marco Verch Professional Photographer, CCBY; Grendelkhan, CCBY; Bobby Mikul, CC0; Vincent Van Gogh, CC0; pxhere.com, CC0; Smart Home Perfected, CC-BY.  
2 Image credit: Bobby Mikul, CC0.  
3 Image credit: Chris Schrier, CC-BY.