Categories
Misc

NVIDIA GTC to Feature CEO Jensen Huang Keynote Announcing New AI and Metaverse Technologies, 200+ Sessions With Top Tech, Business Execs

Deep Learning Pioneers Yoshua Bengio, Geoff Hinton, Yann LeCun Among the Scores of Industry Experts to Present at World’s Premier AI Conference, Sept. 19-22SANTA CLARA, Calif., Aug. 15, 2022 …

Categories
Misc

From Sapling to Forest: Five Sustainability and Employment Initiatives We’re Nurturing in India

For over a decade, NVIDIA has invested in social causes and communities in India as part of our commitment to corporate social responsibility. Bolstering those efforts, we’re unveiling this year’s investments in five projects that have been selected by the NVIDIA Foundation team, focused on the areas of environmental conservation, ecological restoration, social innovation and job Read article >

The post From Sapling to Forest: Five Sustainability and Employment Initiatives We’re Nurturing in India appeared first on NVIDIA Blog.

Categories
Offsites

Rax: Composable Learning-to-Rank Using JAX

Ranking is a core problem across a variety of domains, such as search engines, recommendation systems, or question answering. As such, researchers often utilize learning-to-rank (LTR), a set of supervised machine learning techniques that optimize for the utility of an entire list of items (rather than a single item at a time). A noticeable recent focus is on combining LTR with deep learning. Existing libraries, most notably TF-Ranking, offer researchers and practitioners the necessary tools to use LTR in their work. However, none of the existing LTR libraries work natively with JAX, a new machine learning framework that provides an extensible system of function transformations that compose: automatic differentiation, JIT-compilation to GPU/TPU devices and more.

Today, we are excited to introduce Rax, a library for LTR in the JAX ecosystem. Rax brings decades of LTR research to the JAX ecosystem, making it possible to apply JAX to a variety of ranking problems and combine ranking techniques with recent advances in deep learning built upon JAX (e.g., T5X). Rax provides state-of-the-art ranking losses, a number of standard ranking metrics, and a set of function transformations to enable ranking metric optimization. All this functionality is provided with a well-documented and easy to use API that will look and feel familiar to JAX users. Please check out our paper for more technical details.

Learning-to-Rank Using Rax
Rax is designed to solve LTR problems. To this end, Rax provides loss and metric functions that operate on batches of lists, not batches of individual data points as is common in other machine learning problems. An example of such a list is the multiple potential results from a search engine query. The figure below illustrates how tools from Rax can be used to train neural networks on ranking tasks. In this example, the green items (B, F) are very relevant, the yellow items (C, E) are somewhat relevant and the red items (A, D) are not relevant. A neural network is used to predict a relevancy score for each item, then these items are sorted by these scores to produce a ranking. A Rax ranking loss incorporates the entire list of scores to optimize the neural network, improving the overall ranking of the items. After several iterations of stochastic gradient descent, the neural network learns to score the items such that the resulting ranking is optimal: relevant items are placed at the top of the list and non-relevant items at the bottom.

Using Rax to optimize a neural network for a ranking task. The green items (B, F) are very relevant, the yellow items (C, E) are somewhat relevant and the red items (A, D) are not relevant.

Approximate Metric Optimization
The quality of a ranking is commonly evaluated using ranking metrics, e.g., the normalized discounted cumulative gain (NDCG). An important objective of LTR is to optimize a neural network so that it scores highly on ranking metrics. However, ranking metrics like NDCG can present challenges because they are often discontinuous and flat, so stochastic gradient descent cannot directly be applied to these metrics. Rax provides state-of-the-art approximation techniques that make it possible to produce differentiable surrogates to ranking metrics that permit optimization via gradient descent. The figure below illustrates the use of rax.approx_t12n, a function transformation unique to Rax, which allows for the NDCG metric to be transformed into an approximate and differentiable form.

Using an approximation technique from Rax to transform the NDCG ranking metric into a differentiable and optimizable ranking loss (approx_t12n and gumbel_t12n).

First, notice how the NDCG metric (in green) is flat and discontinuous, making it hard to optimize using stochastic gradient descent. By applying the rax.approx_t12n transformation to the metric, we obtain ApproxNDCG, an approximate metric that is now differentiable with well-defined gradients (in red). However, it potentially has many local optima — points where the loss is locally optimal, but not globally optimal — in which the training process can get stuck. When the loss encounters such a local optimum, training procedures like stochastic gradient descent will have difficulty improving the neural network further.

To overcome this, we can obtain the gumbel-version of ApproxNDCG by using the rax.gumbel_t12n transformation. This gumbel version introduces noise in the ranking scores which causes the loss to sample many different rankings that may incur a non-zero cost (in blue). This stochastic treatment may help the loss escape local optima and often is a better choice when training a neural network on a ranking metric. Rax, by design, allows the approximate and gumbel transformations to be freely used with all metrics that are offered by the library, including metrics with a top-k cutoff value, like recall or precision. In fact, it is even possible to implement your own metrics and transform them to obtain gumbel-approximate versions that permit optimization without any extra effort.

Ranking in the JAX Ecosystem
Rax is designed to integrate well in the JAX ecosystem and we prioritize interoperability with other JAX-based libraries. For example, a common workflow for researchers that use JAX is to use TensorFlow Datasets to load a dataset, Flax to build a neural network, and Optax to optimize the parameters of the network. Each of these libraries composes well with the others and the composition of these tools is what makes working with JAX both flexible and powerful. For researchers and practitioners of ranking systems, the JAX ecosystem was previously missing LTR functionality, and Rax fills this gap by providing a collection of ranking losses and metrics. We have carefully constructed Rax to function natively with standard JAX transformations such as jax.jit and jax.grad and various libraries like Flax and Optax. This means that users can freely use their favorite JAX and Rax tools together.

Ranking with T5
While giant language models such as T5 have shown great performance on natural language tasks, how to leverage ranking losses to improve their performance on ranking tasks, such as search or question answering, is under-explored. With Rax, it is possible to fully tap this potential. Rax is written as a JAX-first library, thus it is easy to integrate it with other JAX libraries. Since T5X is an implementation of T5 in the JAX ecosystem, Rax can work with it seamlessly.

To this end, we have an example that demonstrates how Rax can be used in T5X. By incorporating ranking losses and metrics, it is now possible to fine-tune T5 for ranking problems, and our results indicate that enhancing T5 with ranking losses can offer significant performance improvements. For example, on the MS-MARCO QNA v2.1 benchmark we are able to achieve a +1.2% NDCG and +1.7% MRR by fine-tuning a T5-Base model using the Rax listwise softmax cross-entropy loss instead of a pointwise sigmoid cross-entropy loss.

Fine-tuning a T5-Base model on MS-MARCO QNA v2.1 with a ranking loss (softmax, in blue) versus a non-ranking loss (pointwise sigmoid, in red).

Conclusion
Overall, Rax is a new addition to the growing ecosystem of JAX libraries. Rax is entirely open source and available to everyone at github.com/google/rax. More technical details can also be found in our paper. We encourage everyone to explore the examples included in the github repository: (1) optimizing a neural network with Flax and Optax, (2) comparing different approximate metric optimization techniques, and (3) how to integrate Rax with T5X.

Acknowledgements
Many collaborators within Google made this project possible: Xuanhui Wang, Zhen Qin, Le Yan, Rama Kumar Pasumarthi, Michael Bendersky, Marc Najork, Fernando Diaz, Ryan Doherty, Afroz Mohiuddin, and Samer Hassan.

Categories
Misc

Get Hands-On Training from NVIDIA Experts at GTC

What if you could spend 8 hours with an AI legend while getting hands-on experience using some of the most advanced GPU and DPU technology available? As part of…

What if you could spend 8 hours with an AI legend while getting hands-on experience using some of the most advanced GPU and DPU technology available?

As part of the upcoming GPU Technical Conference, the NVIDIA Deep Learning Institute (DLI) is offering 20 full-day workshops covering a range of deep learning, data science, and accelerated computing topics. In each workshop, you are given access to a fully configured, GPU-accelerated server in the cloud. You gain experience building and deploying an end-to-end project using industry-standard software, tools, and frameworks while learning from some of the most experienced AI practitioners in the industry.

DLI workshops are currently $99 until August 29, $149 as of August 30. Register now!.

All workshops are created and taught by NVIDIA experts. Here are three who are teaching DLI workshops at GTC:

  • Bob Crovella (USA)
  • Adam Grzywaczewski (England)
  • Gwangsoo Hong (Korea)

Bob Crovella, NVIDIA solution architect (USA)  

Photo of Bob Crovella
Bob Crovella

Bob has been a solution architect and field application engineer in the areas of scientific simulation, HPC, and deep learning for almost 25 years at NVIDIA. He and his teams have helped hundreds of customers and partners figure out how to leverage the capabilities of accelerated computing to solve some of the world’s most difficult problems.

After NVIDIA introduced CUDA in 2007, Bob was one of the first to train customers and partners on how to unlock the power of GPUs and has since become one of the leading experts on parallel computing architecture.   

“It’s breathtaking. When I first learned to program CUDA, I was amazed by what the machine is capable of and the power you can unlock with your code. You witness something speed up dramatically, like 10X or 100X faster. And suddenly you have this realization that this thing is everything they said it was. I know this is kind of geeky, right? But through my work and teaching DLI, I get to give that same opportunity to others to experience that kind of excitement—to program the most powerful piece of processing hardware on the planet. It’s not an experience that everyone gets.” 

Bob earned a BS in electrical and electronics engineering from the University of Buffalo and an MS in electrical engineering from Rensselaer Polytechnic Institute. He is currently certified to teach four DLI courses.

At GTC, Bob will be teaching Scaling CUDA C++ Applications to Multiple Nodes on Monday, September 19 from 9 AM to 5 PM PDT. “This is one of the most advanced CUDA programming classes that we offer. We help students take GPU programming to the next level: using multiple computers in a cluster to solve bigger and bigger problems.”

Adam Grzywaczewski, NVIDIA senior deep learning solution architect (England)

Photo of Adam Grzywaczewski
Adam Grzywaczewski

Adam is a senior deep learning solution architect at NVIDIA. Over the last 5 years, he has worked with hundreds of customers helping them design and implement AI solutions. He specializes in large training workloads and neural networks that require hundreds of GPUs to train and run.

“When I first started at NVIDIA, DGX was new. In fact, I have the first prototype of a DGX Station here under my desk. Over time, I have seen customers systematically start to migrate to intensive work on very large systems, very large training jobs, and a surprisingly large number of inference workloads. We are seeing customers have a lot of very serious conversations and engineering work around deployment to production.”

Adam has co-authored two DLI workshops, is certified in six workshops, and has taught the most workshops among the EMEA solution architects in the past year.    

“Our workshops are very focused, and they are designed with a very pragmatic attitude—to solve the problems that they are advertised to solve. We distill huge amounts of knowledge into each course, information that doesn’t exist in such a distilled format anywhere else. And you get direct access to fully configured GPUs. In a course that we just released, the student starts the training process of an extremely large language model and then deploys that model to production. And with just a couple of clicks, teaches that model how to translate and how to answer the questions. It’s actually quite empowering.”

Adam received his BS in information retrieval from Coventry University, his MS in computer science from the Silesian University of Technology, and his Ph. D. from Coventry University.

At GTC, Adam will be teaching Model Parallelism: Building and Deploying Large Neural Networks.

Gwangsoo Hong, NVIDIA solution architect (Korea)

Photo of Gwansoo Hong
Gwansoo Hong

Gwangsoo has been a solution architect with NVIDIA for almost 4 years. His current responsibilities include helping customers get the most value out of their NVIDIA full-stack platform. He specializes in computer vision and NLP with deep learning with expertise in GPU acceleration for large-scale models. He is certified in eight DLI workshops and is one of our most sought-after instructors in Korea.

“The part I love the most about being a DLI instructor is working with various students and teaching them about end-to-end deep learning workloads like training, inference, and services; helping them learn about different workloads and application domains; and materializing their ideas. It’s also rewarding to teach students of all backgrounds and ages and see them successfully complete the DLI course. I learn something from each of them. The reaction I get most often from my students is, ‘This can’t be.” 

At GTC, he will also be teaching Model Parallelism: Building and Deploying Large Neural Networks on  on Wednesday, September 21 from 9 AM to 6 PM KST.

Register now for early discounts

Don’t miss this unique opportunity to take your AI skills to the next level. Registration for the conference is free and the DLI workshops are offered at a special price of $149, or $99 if you register before August 28 (normally $500 per seat.)

For the complete list, see GTC Workshops & Training. Some workshops are available in Taiwanese, Korean, and Japanese and are scheduled in those respective time zones.

Categories
Misc

New NVIDIA Neural Graphics SDKs Make Metaverse Content Creation Available to All

A dozen tools and programs—including new releases NeuralVDB and Kaolin Wisp—make 3D content creation easy and fast for millions of designers and creators.

A dozen tools and programs—including new releases NeuralVDB and Kaolin Wisp—make 3D content creation easy and fast for millions of designers and creators.

Categories
Misc

Upcoming Webinar: Designing Efficient Vision Transformer Networks for Autonomous Vehicles

Explore design principles for efficient transformers in production and how innovative model design can help achieve better accuracy in AV perception.

Explore design principles for efficient transformers in production and how innovative model design can help achieve better accuracy in AV perception.

Categories
Misc

Top Israel Medical Center Partners with AI Startups to Help Detect Brain Bleeds, Other Critical Cases

Israel’s largest private medical center is working with startups and researchers to bring potentially life-saving AI solutions to real-world healthcare workflows. With more than 1.5 million patients across eight medical centers, Assuta Medical Centers conduct over 100,000 surgeries, 800,000 imaging tests and hundreds of thousands of other health diagnostics and treatments each year. These create Read article >

The post Top Israel Medical Center Partners with AI Startups to Help Detect Brain Bleeds, Other Critical Cases appeared first on NVIDIA Blog.

Categories
Misc

GFN Thursday Brings Thunder to the Cloud With ‘Rumbleverse’ Arriving on GeForce NOW

It’s time to rumble in Grapital City with Rumbleverse launching today on GeForce NOW. Punch your way into the all-new, free-to-play Brawler Royale from Iron Galaxy Studios and Epic Games Publishing, streaming from the cloud to nearly all devices. That means gamers can tackle, uppercut, body slam and more from any GeForce NOW-compatible device, including Read article >

The post GFN Thursday Brings Thunder to the Cloud With ‘Rumbleverse’ Arriving on GeForce NOW appeared first on NVIDIA Blog.

Categories
Misc

New Releases of NVIDIA Nsight Systems and Nsight Graphics Debut at SIGGRAPH 2022

Graphics professionals and researchers have come together at SIGGRAPH 2022 to share their expertise and learn about recent innovations in the computer graphics…

Graphics professionals and researchers have come together at SIGGRAPH 2022 to share their expertise and learn about recent innovations in the computer graphics industry. 

NVIDIA Developer Tools is excited to be a part of this year’s event, hosting the hands-on lab Using Nsight to Optimize Ray-Tracing Applications, and announcing new releases for NVIDIA Nsight Systems and NVIDIA Nsight Graphics that are available for download now.

NVIDIA Nsight Systems 2022.3

The new 2022.3 release of Nsight Systems brings expanded Vulkan API support alongside improvements to the user experience.

Nsight Systems now supports Vulkan Video, the Vulkan solution for processing hardware-accelerated video files. In previous versions of Nsight Systems, a Vulkan Video workload would not be identified as a subset of the larger queue command it occupied. 

With full integration in Nsight Systems 2022.3, Vulkan Video coding ambiguity is removed and the process can be profiled in the timeline. 

Screenshot showing that Vulkan Video workload can be identified in the Nsight System timeline below the Vulkan tab
Figure 1. Vulkan Video workload can be identified in the Nsight System timeline below the Vulkan tab

With the new VK_KHR_graphics_pipeline_library extension, Vulkan applications can now precompile shaders and link them at runtime at a substantially reduced cost. This is a critical feature for shader-heavy applications such as games, making its full support an exciting edition to Nsight Systems 2022.3. 

To round out the new version, visual improvements to multi-report viewing have been made for better clarity. For Linux machines, improved counters for the CPU, PMU, and OS make system-wide performance tracing more precise. A host of bug fixes accompany these updates.

Learn more about Nsight Systems 2022.3.

NVIDIA Nsight Graphics 2022.4

Nsight Graphics 2022.4 introduces a robust set of upgrades to its most powerful profiling tools.

In the 2022.4 release, the API inspector has been redesigned. The new design includes an improved display, search functions within API Inspector pages, significantly enhanced constant buffer views, and data export for data persistence and offline comparison.

Watch the updated demonstration video (below) from the Nsight Graphics team to learn about all the new features and improved interface:

Video 1. A demonstration of the new Nsight Graphics features and improved interface

Nsight Graphics GPU Trace is a detailed performance timeline that tracks GPU throughput, enabling meticulous profiling of hardware utilization. To aid the work of graphics development across all specifications, GPU Trace now supports generating trace and analysis for OpenGL applications on Windows and Linux.

Screenshot showing full GPU utilization timeline for an OpenGL application captured by NVIDIA Nsight Graphics
Figure 2. Full GPU utilization timeline for an OpenGL application captured by NVIDIA Nsight Graphics

Also new to GPU Trace, you can now identify subchannel switches with an event overlay. Subchannel switches occur when the GPU swaps between Compute or Graphics calls in the same hardware queue, causing the GPU to briefly idle. In the interest of performance, it is best to minimize subchannel switches, which can now be identified within the timeline.

The shader profiler summary has also been expanded, with new columns for per-shader register numbers as well as theoretical warp occupancy.

Image showing expanded shader profile summary section with new columns on the left that identify shader count and warp occupancy
Figure 3. Expanded shader profile summary section with new columns on the left that identify shader count and warp occupancy

Nsight Graphics 2022.4 is wrapped up with support for enhanced barriers that are available in recent DirectX 12 Agility SDKs. Applications that use either enhanced barriers or traditional barriers will now be equally supported. Learn more about all of the new additions to Nsight Graphics 2022.4

Nsight Deep Learning Designer 2022.2

A new version of Nsight Deep Learning Designer is available now. The 2022.2 update features expanded support for importing PyTorch models as well as launching the PyTorch exporter from a virtual environment. Performance improvements have also been made to the Channel Inspector as well as path-finding to reduce overhead. 

Paired with this release, NVIDIA Feature Map Explorer 2022.1 is available now, offering measurable performance boosts to its feature map loading process alongside additional metrics for tracking tensor values. Learn more about Nsight Deep Learning Designer 2022.2 and NVIDIA Feature Map Explorer 2022.1.

Get the latest Nsight releases

Additional resources

Watch a guided walkthrough about using Nsight tools to work through real-life development scenarios.

For even more information, see:

Want to help us build better tools for you? Share your thoughts with the NVIDIA Nsight Graphics Survey that takes less than one minute to complete. 

Categories
Misc

Reimagining Drug Discovery with Computational Biology at GTC 2022

Take a deep dive into the latest advances in drug research with AI and accelerated computing at these GTC 2022 featured sessions.

Take a deep dive into the latest advances in drug research with AI and accelerated computing at these GTC 2022 featured sessions.