Categories
Misc

NVIDIA Hopper Sweeps AI Inference Benchmarks in MLPerf Debut

In their debut on the MLPerf industry-standard AI benchmarks, NVIDIA H100 Tensor Core GPUs set world records in inference on all workloads, delivering up to 4.5x more performance than previous-generation GPUs. The results demonstrate that Hopper is the premium choice for users who demand utmost performance on advanced AI models. Additionally, NVIDIA A100 Tensor Core Read article >

The post NVIDIA Hopper Sweeps AI Inference Benchmarks in MLPerf Debut appeared first on NVIDIA Blog.

Categories
Misc

Full-Stack Innovation Fuels Highest MLPerf Inference 2.1 Results for NVIDIA

Today’s AI-powered applications are enabling richer experiences, fueled by both larger and more complex AI models as well as the application of many models in…

Today’s AI-powered applications are enabling richer experiences, fueled by both larger and more complex AI models as well as the application of many models in a pipeline. To meet the increasing demands of AI-infused applications, an AI platform must not only deliver high performance but also be versatile enough to deliver that performance across a diverse range of AI models. To maximize infrastructure utilization and optimize CapEx, the ability to run the entire AI workflow on the same infrastructure is critical: from data prep and model training to deployed inference.

MLPerf benchmarks have emerged as industry-standard, peer-reviewed measures of deep learning performance, covering AI training, AI inference, and high-performance computing (HPC). MLPerf Inference 2.1, the latest iteration of the MLPerf Inference benchmark suite, covers a breadth of common AI use cases including recommenders, natural language processing, speech recognition, medical imaging, image classification, and object detection.

In this round, NVIDIA made its first MLPerf submissions on the latest NVIDIA H100 Tensor Core GPU based on the breakthrough NVIDIA Hopper Architecture.

  • H100 set new per-accelerator records on all data center tests, demonstrating up to 4.5x higher inference performance compared to the NVIDIA A100 Tensor Core GPU.
  • A100 continued to demonstrate excellent performance across the full suite of MLPerf Inference 2.1 tests for both the data center and edge inference scenarios.

NVIDIA Jetson AGX Orin, built for edge AI and robotics applications, also delivered up to a 50% improvement in performance-per-watt following its debut in the prior round of MLPerf Inference, and ran all edge workloads and scenarios.

Delivering these performance results required deep software and hardware co-optimization. In this post, we discuss the results and then dive into some of the key software optimizations.

NVIDIA H100 Tensor Core technology

On a per-streaming multiprocessor (SM) basis, the H100 Tensor Cores provide twice the matrix multiply-accumulate (MMA) throughput clock-for-clock of the A100 SMs when using the same data types and four times the throughput when comparing FP16 on an A100 SM to FP8 on an H100 SM. New kernels had to be developed in order to leverage several of the H100’s new capabilities and to take advantage of these dramatically faster Tensor Cores.

The H100 Tensor Cores process data so rapidly that it can be challenging to keep them both fed with enough input data and to post-process their output data. Kernels must create an efficient pipeline such that data loading, Tensor Core processing, post-processing, and storage all happen simultaneously and efficiently.

The new H100 asynchronous transaction barriers are instrumental to the efficiency of these pipelines. The asynchronous barriers allow producer threads to run ahead after signaling data availability. In the case of data loading threads, this provides significant improvement in kernels’ ability to hide memory system latencies and ensure a steady stream of input data is available for the Tensor Cores. The asynchronous transaction barriers also provide an efficient mechanism for consumer threads to wait on resource availability so that they don’t waste SM resources in spin loops.

The Tensor Memory Accelerator (TMA) further turbocharges these kernels. The TMA was designed to natively integrate into asynchronous pipelines, and provides for the asynchronous transfer of multi-dimensional tensors from global memory into the SM’s shared memory.

The Tensor Cores are so fast that operations like address calculation can become a performance bottleneck; the TMA offloads this work so that the kernels can focus on running the math and post-processing as quickly as possible.

Finally, the new kernels employ H100 thread block clusters to exploit locality at the GPU processing cluster (GPC). The thread blocks within each thread block cluster collaborate to load data more efficiently and provide higher input bandwidth to the Tensor Cores.

NVIDIA H100 Tensor Core GPU performance results

Starting with the Data Center category, the NVIDIA H100 Tensor Core GPU delivered the highest per-accelerator performance on every workload across both the Server and Offline scenarios, delivering up to 4.5x more performance in the Offline scenario and up to 3.9x more performance in the Server scenario than the A100 Tensor Core GPU.

Left bar chart shows the H100 delivering up to 3.9x more performance than the A100 in the Server scenario. Right chart shows H100 delivering up to 4.5x more performance than A100 in the Offline scenario.
Figure 1. H100 delivers up to 4.5x more performance than A100 in the MLPerf Inference 2.1 Data Center category

Compared to a CPU-only submission, the H100 Tensor Core GPU provides up to 36x higher performance.

Thanks to full-stack improvements, NVIDIA Jetson AGX Orin turned in large improvements in energy efficiency compared to the last round, delivering up to a 50% efficiency improvement.

Left bar chart shows the queries per watt improvements in the Offline scenario. Right chart shows the energy per stream improvements in the Single Stream and Multi Stream scenarios in the AGX Jetson Orin MLPerf Inference 2.1 submission compared to the MLPerf Inference 2.0 submission.
Figure 2. Efficiency improvements in the NVIDIA Jetson AGX Orin MLPerf Inference 2.1 compared to the prior submission

Here’s a closer look at the software optimizations that made these results possible.

High-performance BERT inference using FP8

Diagram shows the steps to perform FP8 inference on BERT.
Figure 3. FP8 Inference on BERT using E4M3 offers increased stability for the forward pass

The NVIDIA Hopper Architecture incorporates new fourth-generation Tensor Cores with support for two new FP8 data types: E4M3 and E5M2. These new data types increase Tensor Core throughput by 2x and reduce memory requirements by 2x compared to 16-bit floating-point.

The E4M3 offers an additional mantissa bit, which leads to increased stability in the first step of the calculation process, known as the forward pass. The additional exponent bit of E5M2 is more helpful for preventing overflow/underflow during the backward pass. For our BERT FP8 submission, we used E4M3.

Our experiments on NLP models like BERT showed that when quantizing the model from a higher precision (FP32) to a lower precision (such as FP8 or INT8), the drop in accuracy observed with FP8 is lower than that of INT8.

Although we can use quantization aware training (QAT) to recover some of the model accuracy with INT8, the accuracy of INT8 under post training quantization (PTQ) remains a challenge. This is where FP8 is beneficial: It can provide 99.9% accuracy of the FP32 model under PTQ without the additional cost and effort required to run QAT. As a result, FP8 can be used for the 99.9% high accuracy category of MLPerf where previously FP16 was required. In essence, FP8 delivered the performance of INT8 with the accuracy of FP16 for this workload.

In the NVIDIA BERT submission, all fully connected and matrix multiply layers in the encoder used FP8 precision. The implementation of these layers used cuBLASLt to perform the FP8 GEMMs on the H100 Tensor Cores.

Key BERT optimizations were extended to support FP8, including the following:

  • Removing padding: Inputs to BERT have variable sequence lengths, and are padded to a maximum sequence length. We strip the padding to avoid wasting compute on padding, and then reconstruct the padded sequences for the final output to match the input shape.
  • Fused multi-head attention: This is a fusion of four operations: transpose Q/K, Q*K, softmax, and QK*V to compute the attention. Fused multi-head attention enhances memory efficiency, skipping the padding to prevent useless computing. Fused multi-head attention provides roughly a 2x end-to-end speedup.
  • Activation fusion: We fuse the GEMM with more operations, including bias and activation functions (GeLU). This fusion also helps enhance memory efficiency by removing extra memory transfers.

RetinaNet for object detection

In MLPerf Inference 2.1, a new one-stage object detection model named RetinaNet was added. This replaced the ssd-resnet34 and ssd-mobilenet workloads of MLPerf Inference 2.0. This updated model architecture and its new inference dataset bring new challenges in delivering fast, accurate, and power-efficient inference.

NVIDIA submitted results for RetinaNet across all the platforms demonstrating the width of our software support.

RetinaNet is trained and inferred using the Open Images dataset, which contains an order of magnitude more object categories and object notations than the COCO dataset used earlier. For RetinaNet, 264 unique classes are selected for training and inference tasks. This is significantly more than the 81 classes used for ssd-resnet34.

Photo from the OpenImage Dataset shows brightly colored bounding boxes around objects and people in a theater.
Figure 4. The OpenImage Dataset used for RetinaNet training and inference includes highly detailed object annotations

Although RetinaNet is also a single-shot object detection model, it has several key differences compared to ssd-resnet34:

  • RetinaNet uses Feature Pyramid Network (FPN) as its backbone on top of a feedforward ResNeXt architecture. ResNeXt uses group convolution in its computation blocks, and has different math characteristics from that of ResNet34.
  • For every image, 120,087 boxes and 264 unique class scores per box are fed into the non-maximum suppression (NMS) layer, and the top 1,000 scoring boxes are selected for outputs. In ssd-resnet34, these numbers were 25x lesser: 15,130 boxes, 81 classes per box, and 200 topK.
The RetinaNet model architecture illustrated with ResNeXt feed-forward network, Feature Pyramid Network (FPN) and Class/Box subnet (K=264, A=9)
Figure 5. MLPerf Inf 2.1 RetinaNet model architecture

NVIDIA used TensorRT as the backend for RetinaNet. TensorRT significantly accelerates inference throughput by automatically optimizing both the graph execution and layer execution:

  • TensorRT provides full support to execute the model inference in mixed FP32/INT8 precision, with minimal accuracy loss compared to FP16 and FP32 precision.
  • TensorRT automatically selects optimized kernels for group convolutions across all 16 ResNeXt blocks.
  • TensorRT provides fusion patterns for convolution, activation, and (optional) pooling layers, which optimize the memory movement for faster inference by merging the layer weights and reducing the number of operations.
  • For the post-processing NMS layer, NVIDIA leverages EfficientNMS, which is an open-sourced high-performance CUDA kernel specialized for NMS tasks, provided as a TensorRT plugin.

NVIDIA Jetson AGX Orin optimizations

NVIDIA Jetson AGX Orin is the latest NVIDIA platform for edge AI and robotics applications. In this round of MLPerf Inference, Jetson AGX Orin demonstrated excellent performance and energy efficiency improvements across the breadth of MLPerf Inference 2.1 edge workloads. Improvements included a 45% reduction in ResNet-50 multi-stream latency and a 17% boost in BERT offline throughput  over the previous round (v2.0). In the power submission, Orin achieved up to 52% power reduction and 48% perf-per-watt improvement on selected benchmarks.  The submissions used the 22.08 Jetson CUDA-X AI Developer Preview software, which includes an optimized NVIDIA Jetson Linux (L4T) image, TensorRT 8.5.0, CUDA 11.4.14, and cuDNN 8.5.0, allowing customers to easily benefit from these same improvements. RetinaNet is fully supported and performant on Jetson AGX Orin with this software stack. This demonstrates the ability of the NVIDIA platform and software to support performant DL inference out-of-the-box.

NVIDIA Orin performance improvements

The significant improvement in MLPerf-Inference v2.1 came from both the general performance boost enabled by the system image and TensorRT 8.5 in 22.08 Jetson CUDA-X AI Developer Preview. The optimized Jetson L4T image provides users access to MaxN power mode, which boosts the frequencies of both the GPU and the DLA units. Meanwhile, this image has the option to use an enlarged page size of 64K that can reduce TLB cache misses when running certain inference workloads. Furthermore, the 3.10.1 DLA compiler natively included in the image incorporates a series of optimization features, which increases the performance of workloads running on the Orin DLA by up to 53%.

TensorRT 8.5 includes two new optimizations that improve inference performance. The first is native support for cuDLA which removes the imposition of inserting copy nodes between DLA nodes and GPU nodes. We observed approximately 1.8% DLA engine end-to-end improvements switching from NVMedia to cuDLA. The second is the addition of optimized kernels for small channel * filter size convolutions fused with a beta=1 residual connection. This improved BERT performance by 17% and ResNet50 by 5% on the GPU in Orin.

NVIDIA Orin energy efficiency improvements

The NVIDIA Orin power submission benefited from all the above performance improvements and also focused on further power reduction. Using the updated L4T image for Orin the power consumption is reduced by fine tuning the CPU, GPU, and DLA frequencies per benchmark to achieve the optimum perf-per-watt. This image also enables new platform power saving features like regulator auto phase shedding and low-power states in low-load conditions. The flexibility of USB-C support in Orin was leveraged to consolidate all I/O through Ethernet-over-USB communication. System power was further reduced by disabling I/O subsystems like Ethernet, WiFi, and DP that are not essential for inference and also by using off-the-shelf higher efficiency GaN power adapters.

These platform and software optimizations reduced system power consumption by up to 52% and improved perf-per-watt by up to 48% over our previous submission in 2.0.

3D U-Net performance improvement

In MLPerf Inference v2.0, the 3D U-Net medical imaging workload switched to the KITS19 dataset, which increased image sizes by up to 8x and raised the amount of compute processing required for a given sample up to 18x due to the sliding window inference. For more information about the NVIDIA MLPerf Inference v2.0 submission, see Getting the Best Performance on MLPerf Inference 2.0.

For MLPerf Inference 2.1, we further improved the performance of the first convolution layer with the TensorRT IPluginV2DDynamicExt plugin.

KiTS19 images are single-channel tensors, and this challenges the performance of the very first 3D convolution in 3D U-Net. In 3D convolution, this channel dimension typically contributes to GEMM’s K dimension. This is specially relevant because the overall performance on 3D U-Net is dominated by the first two and last two 3D-Convolutions. In MLPerf Inference v2.0, these four convolutions contributed to roughly 38% of the entire network run-time; the very first layer responsible for 8%. A non-trivial factor that explains that is the need to use zero-padding to accommodate for the NC/32DHW32 vectorized format layout in where the tensor cores can be utilized most efficiently.

With our updated plugin, we use a INT8 Linear format to leverage efficient computation on this single-channel limited 3D shape input. The advantages of this are twofold:

  • Higher effective use of flops: by not performing unneeded computations
  • PCIe transfer B/W savings: avoids the overhead of either moving zero padded input tensor between host and GPU memory or zero-padding on GPU before sending the input tensor to TensorRT

This optimization improved the first layer performance by 2.7x. Additionally, the slicing kernel no longer needs to deal with zero-padding, and therefore its performance also improved by 2x. As a net result, 3D-UNet’s end-to-end performance improved by 5% in MLPerf Inference 2.1.

Breaking performance records across workloads

In MLPerf Inference 2.1, the first NVIDIA H100 submission set new per-accelerator performance records on all workloads in the data center scenario, and delivered up to 4.5x higher performance than the A100 and 36x higher performance than leading CPUs. This generational performance uplift was possible due to both the many breakthroughs of the NVIDIA Hopper Architecture as well as immense software optimizations that take advantage of those capabilities.

NVIDIA Jetson AGX Orin saw an up to 50% boost in energy efficiency in just one round and it continues to deliver overall inference performance leadership for edge AI and robotics applications.

This latest round of MLPerf Inference showcases the leading performance and versatility of the NVIDIA AI platform for the full breadth of AI workloads and scenarios. With the H100 Tensor Core GPU, we are supercharging the NVIDIA AI platform for the most advanced models and providing users with new levels of performance and capabilities for the most demanding workloads.

For more information, see NVIDIA Hopper Architecture In-Depth.

Categories
Misc

Upcoming Event: Recommender Systems Sessions at GTC 2022

Learn about transformer-powered personalized online advertising, cross-framework model evaluation, the NVIDIA Merlin ecosystem, and more with these featured GTC…

Learn about transformer-powered personalized online advertising, cross-framework model evaluation, the NVIDIA Merlin ecosystem, and more with these featured GTC 2022 sessions.

Categories
Misc

GeForce NOW Supports Over 1,400 Games Streaming Instantly

This GFN Thursday marks a milestone: With the addition of six new titles this week, more than 1,400 games are now available to stream from the GeForce NOW library. Plus, GeForce NOW members streaming to supported Smart TVs from Samsung and LG can get into their games faster with an improved user interface. Your Games, Read article >

The post GeForce NOW Supports Over 1,400 Games Streaming Instantly appeared first on NVIDIA Blog.

Categories
Misc

Explainer: What Is a QPU?

A QPU, aka a quantum processor, is the brain of a quantum computer that uses the behavior of particles like electrons or photons to make certain kinds of…

A QPU, aka a quantum processor, is the brain of a quantum computer that uses the behavior of particles like electrons or photons to make certain kinds of calculations much faster than processors in today’s computers.

Categories
Misc

Upcoming Event: Deep Learning Sessions at GTC 2022

Join our deep learning sessions at GTC 2022 to learn about real-world use cases, new tools, and best practices for training and inference.

Join our deep learning sessions at GTC 2022 to learn about real-world use cases, new tools, and best practices for training and inference.

Categories
Offsites

Digitizing Smell: Using Molecular Maps to Understand Odor

Did you ever try to measure a smell? …Until you can measure their likenesses and differences you can have no science of odor. If you are ambitious to found a new science, measure a smell.
— Alexander Graham Bell, 1914.

How can we measure a smell? Smells are produced by molecules that waft through the air, enter our noses, and bind to sensory receptors. Potentially billions of molecules can produce a smell, so figuring out which ones produce which smells is difficult to catalog or predict. Sensory maps can help us solve this problem. Color vision has the most familiar examples of these maps, from the color wheel we each learn in primary school to more sophisticated variants used to perform color correction in video production. While these maps have existed for centuries, useful maps for smell have been missing, because smell is a harder problem to crack: molecules vary in many more ways than photons do; data collection requires physical proximity between the smeller and smell (we don’t have good smell “cameras” and smell “monitors”); and the human eye only has three sensory receptors for color while the human nose has > 300 for odor. As a result, previous efforts to produce odor maps have failed to gain traction.

In 2019, we developed a graph neural network (GNN) model that began to explore thousands of examples of distinct molecules paired with the smell labels that they evoke, e.g., “beefy”, “floral”, or “minty”, to learn the relationship between a molecule’s structure and the probability that such a molecule would have each smell label. The embedding space of this model contains a representation of each molecule as a fixed-length vector describing that molecule in terms of its odor, much as the RGB value of a visual stimulus describes its color.

Left: An example of a color map (CIE 1931) in which coordinates can be directly translated into values for hue and saturation. Similar colors lie near each other, and specific wavelengths of light (and combinations thereof) can be identified with positions on the map. Right: Odors in the Principal Odor Map operate similarly. Individual molecules correspond to points (grey), and the locations of these points reflect predictions of their odor character.

Today we introduce the “Principal Odor Map” (POM), which identifies the vector representation of each odorous molecule in the model’s embedding space as a single point in a high-dimensional space. The POM has the properties of a sensory map: first, pairs of perceptually similar odors correspond to two nearby points in the POM (by analogy, red is nearer to orange than to green on the color wheel). Second, the POM enables us to predict and discover new odors and the molecules that produce them. In a series of papers, we demonstrate that the map can be used to prospectively predict the odor properties of molecules, understand these properties in terms of fundamental biology, and tackle pressing global health problems. We discuss each of these promising applications of the POM and how we test them below.

Test 1: Challenging the Model with Molecules Never Smelled Before
First, we asked if the underlying model could correctly predict the odors of new molecules that no one had ever smelled before and that were very different from molecules used during model development. This is an important test — many models perform well on data that looks similar to what the model has seen before, but break down when tested on novel cases.

To test this, we collected the largest ever dataset of odor descriptions for novel molecules. Our partners at the Monell Center trained panelists to rate the smell of each of 400 molecules using 55 distinct labels (e.g., “minty”) that were selected to cover the space of possible smells while being neither redundant nor too sparse. Unsurprisingly, we found that different people had different characterizations of the same molecule. This is why sensory research typically uses panels of dozens or hundreds of people and highlights why smell is a hard problem to solve. Rather than see if the model could match any one person, we asked how close it was to the consensus: the average across all of the panelists. We found that the predictions of the model were closer to the consensus than the average panelist was. In other words, the model demonstrated an exceptional ability to predict odor from a molecule’s structure.

Predictions made by two models, our GNN model (orange) and a baseline chemoinformatic random forest (RF) model (blue), compared with the mean ratings given by trained panelists (green) for the molecule 2,3-dihydrobenzofuran-5-carboxaldehyde. Each bar corresponds to one odor character label (with only the top 17 of 55 shown for clarity). The top five are indicated in color; our model correctly identifies four of the top five, with high confidence, vs. only three of five, with low confidence, for the RF model. The correlation (R) to the full set of 55 labels is also higher in our model.
Unlike alternative benchmark models (RF and nearest-neighbor models trained on various sets of chemoinformatic features), our GNN model outperforms the median human panelist at predicting the panel mean rating. In other words, our GNN model better reflects the panel consensus than the typical panelist.

The POM also exhibited state-of-the-art performance on alternative human olfaction tasks like detecting the strength of a smell or the similarity of different smells. Thus, with the POM, it should be possible to predict the odor qualities of any of billions of as-yet-unknown odorous molecules, with broad applications to flavor and fragrance.

Test 2: Linking Odor Quality Back to Fundamental Biology
Because the Principal Odor Map was useful in predicting human odor perception, we asked whether it could also predict odor perception in animals, and the brain activity that underlies it. We found that the map could successfully predict the activity of sensory receptors, neurons, and behavior in most animals that olfactory neuroscientists have studied, including mice and insects.

What common feature of the natural world makes this map applicable to species separated by hundreds of millions of years of evolution? We realized that the common purpose of the ability to smell might be to detect and discriminate between metabolic states, i.e., to sense when something is ripe vs. rotten, nutritious vs. inert, or healthy vs. sick. We gathered data about metabolic reactions in dozens of species across the kingdoms of life and found that the map corresponds closely to metabolism itself. When two molecules are far apart in odor, according to the map, a long series of metabolic reactions is required to convert one to the other; by contrast, similarly smelling molecules are separated by just one or a few reactions. Even long reaction pathways containing many steps trace smooth paths through the map. And molecules that co-occur in the same natural substances (e.g., an orange) are often very tightly clustered on the map. The POM shows that olfaction is linked to our natural world through the structure of metabolism and, perhaps surprisingly, captures fundamental principles of biology.

Left: We aggregated metabolic reactions found in 17 species across 4 kingdoms to construct a metabolic graph. In this illustration, each circle is a distinct metabolite molecule and an arrow indicates that there is a metabolic reaction that converts one molecule to another. Some metabolites have an odor (color) and others do not (gray), and the metabolic distance between two odorous metabolites is the minimum number of reactions necessary to convert one into the other. In the path shown in bold, the distance is 3. Right: Metabolic distance was highly correlated with distance in the POM, an estimate of perceived odor dissimilarity.

Test 3: Extending the Model to Tackle a Global Health Challenge
A map of odor that is tightly connected to perception and biology across the animal kingdom opens new doors. Mosquitos and other insect pests are drawn to humans in part by their odor perception. Since the POM can be used to predict animal olfaction generally, we retrained it to tackle one of humanity’s biggest problems, the scourge of diseases transmitted by mosquitoes and ticks, which kill hundreds of thousands of people each year.

For this purpose, we improved our original model with two new sources of data: (1) a long-forgotten set of experiments conducted by the USDA on human volunteers beginning 80 years ago and recently made discoverable by Google Books, which we subsequently made machine-readable; and (2) a new dataset collected by our partners at TropIQ, using their high-throughput laboratory mosquito assay. Both datasets measure how well a given molecule keeps mosquitos away. Together, the resulting model can predict the mosquito repellency of nearly any molecule, enabling a virtual screen over huge swaths of molecular space. We validated this screen experimentally using entirely new molecules and found over a dozen of them with repellency at least as high as DEET, the active ingredient in most insect repellents. Less expensive, longer lasting, and safer repellents can reduce the worldwide incidence of diseases like malaria, potentially saving countless lives.

We digitized USDA mosquito repellency data for thousands of molecules previously scanned by Google Books, and used it to refine the learned representation (the map) at the heart of the model. We added additional layers, specifically to predict repellency in a mosquito feeder assay, and iteratively trained the model to improve assay predictions while running computational screens for candidate repellents.
Many molecules showing mosquito repellency in the laboratory assay also showed repellency when applied to humans. Several showed repellency greater than the most common repellents used today (DEET and picaridin).

The Road Ahead
We discovered that our modeling approach to smell prediction could be used to draw a Principal Odor Map for tackling odor-related problems more generally. This map was the key to measuring smell: it answered a range of questions about novel smells and the molecules that produce them, it connected smells back to their origins in evolution and the natural world, and it is helping us tackle important human-health challenges that affect millions of people. Going forward, we hope that this approach can be used to find new solutions to problems in food and fragrance formulation, environmental quality monitoring, and the detection of human and animal diseases.

Acknowledgements
This work was performed by the ML olfaction research team, including Benjamin Sanchez-Lengeling, Brian K. Lee, Jennifer N. Wei, Wesley W. Qian, and Jake Yasonik (the latter two were partly supported by the Google Student Researcher program) and our external partners including Emily Mayhew and Joel D. Mainland from the Monell Center, and Koen Dechering and Marnix Vlot from TropIQ. The Google Books team brought the USDA dataset online. Richard C. Gerkin was supported by the Google Visiting Faculty Researcher program and is also an Associate Research Professor at Arizona State University.

Categories
Misc

Upcoming Event: Manufacturing Workshops at GTC 2022

Join us for manufacturing sessions at GTC 2022, including an expert-led workshop on Computer Vision for Industrial Inspection.

Join us for manufacturing sessions at GTC 2022, including an expert-led workshop on Computer Vision for Industrial Inspection.

Categories
Misc

Model Teachers: Startups Make Schools Smarter With Machine Learning

Like two valedictorians, SimInsights and Photomath tell stories worth hearing about how AI is advancing education. SimInsights in Irvine, Calif., uses NVIDIA conversational AI to make virtual and augmented reality classes lifelike for college students and employee training. Photomath — founded in Zagreb, Croatia and based in San Mateo, Calif. — created an app using Read article >

The post Model Teachers: Startups Make Schools Smarter With Machine Learning appeared first on NVIDIA Blog.

Categories
Misc

Ridiculously Realistic Renders Rule This Week ‘In the NVIDIA Studio’

Viral creator turned NVIDIA 3D artist Lorenzo Drago takes viewers on a jaw-dropping journey through Toyama, Japan’s Etchū-Daimon Station this week In the NVIDIA Studio.

The post Ridiculously Realistic Renders Rule This Week ‘In the NVIDIA Studio’ appeared first on NVIDIA Blog.