Categories
Misc

Get Hands-On Training from NVIDIA Experts at GTC

What if you could spend 8 hours with an AI legend while getting hands-on experience using some of the most advanced GPU and DPU technology available? As part of…

What if you could spend 8 hours with an AI legend while getting hands-on experience using some of the most advanced GPU and DPU technology available?

As part of the upcoming GPU Technical Conference, the NVIDIA Deep Learning Institute (DLI) is offering 20 full-day workshops covering a range of deep learning, data science, and accelerated computing topics. In each workshop, you are given access to a fully configured, GPU-accelerated server in the cloud. You gain experience building and deploying an end-to-end project using industry-standard software, tools, and frameworks while learning from some of the most experienced AI practitioners in the industry.

DLI workshops are currently $99 until August 29, $149 as of August 30. Register now!.

All workshops are created and taught by NVIDIA experts. Here are three who are teaching DLI workshops at GTC:

  • Bob Crovella (USA)
  • Adam Grzywaczewski (England)
  • Gwangsoo Hong (Korea)

Bob Crovella, NVIDIA solution architect (USA)  

Photo of Bob Crovella
Bob Crovella

Bob has been a solution architect and field application engineer in the areas of scientific simulation, HPC, and deep learning for almost 25 years at NVIDIA. He and his teams have helped hundreds of customers and partners figure out how to leverage the capabilities of accelerated computing to solve some of the world’s most difficult problems.

After NVIDIA introduced CUDA in 2007, Bob was one of the first to train customers and partners on how to unlock the power of GPUs and has since become one of the leading experts on parallel computing architecture.   

“It’s breathtaking. When I first learned to program CUDA, I was amazed by what the machine is capable of and the power you can unlock with your code. You witness something speed up dramatically, like 10X or 100X faster. And suddenly you have this realization that this thing is everything they said it was. I know this is kind of geeky, right? But through my work and teaching DLI, I get to give that same opportunity to others to experience that kind of excitement—to program the most powerful piece of processing hardware on the planet. It’s not an experience that everyone gets.” 

Bob earned a BS in electrical and electronics engineering from the University of Buffalo and an MS in electrical engineering from Rensselaer Polytechnic Institute. He is currently certified to teach four DLI courses.

At GTC, Bob will be teaching Scaling CUDA C++ Applications to Multiple Nodes on Monday, September 19 from 9 AM to 5 PM PDT. “This is one of the most advanced CUDA programming classes that we offer. We help students take GPU programming to the next level: using multiple computers in a cluster to solve bigger and bigger problems.”

Adam Grzywaczewski, NVIDIA senior deep learning solution architect (England)

Photo of Adam Grzywaczewski
Adam Grzywaczewski

Adam is a senior deep learning solution architect at NVIDIA. Over the last 5 years, he has worked with hundreds of customers helping them design and implement AI solutions. He specializes in large training workloads and neural networks that require hundreds of GPUs to train and run.

“When I first started at NVIDIA, DGX was new. In fact, I have the first prototype of a DGX Station here under my desk. Over time, I have seen customers systematically start to migrate to intensive work on very large systems, very large training jobs, and a surprisingly large number of inference workloads. We are seeing customers have a lot of very serious conversations and engineering work around deployment to production.”

Adam has co-authored two DLI workshops, is certified in six workshops, and has taught the most workshops among the EMEA solution architects in the past year.    

“Our workshops are very focused, and they are designed with a very pragmatic attitude—to solve the problems that they are advertised to solve. We distill huge amounts of knowledge into each course, information that doesn’t exist in such a distilled format anywhere else. And you get direct access to fully configured GPUs. In a course that we just released, the student starts the training process of an extremely large language model and then deploys that model to production. And with just a couple of clicks, teaches that model how to translate and how to answer the questions. It’s actually quite empowering.”

Adam received his BS in information retrieval from Coventry University, his MS in computer science from the Silesian University of Technology, and his Ph. D. from Coventry University.

At GTC, Adam will be teaching Model Parallelism: Building and Deploying Large Neural Networks.

Gwangsoo Hong, NVIDIA solution architect (Korea)

Photo of Gwansoo Hong
Gwansoo Hong

Gwangsoo has been a solution architect with NVIDIA for almost 4 years. His current responsibilities include helping customers get the most value out of their NVIDIA full-stack platform. He specializes in computer vision and NLP with deep learning with expertise in GPU acceleration for large-scale models. He is certified in eight DLI workshops and is one of our most sought-after instructors in Korea.

“The part I love the most about being a DLI instructor is working with various students and teaching them about end-to-end deep learning workloads like training, inference, and services; helping them learn about different workloads and application domains; and materializing their ideas. It’s also rewarding to teach students of all backgrounds and ages and see them successfully complete the DLI course. I learn something from each of them. The reaction I get most often from my students is, ‘This can’t be.” 

At GTC, he will also be teaching Model Parallelism: Building and Deploying Large Neural Networks on  on Wednesday, September 21 from 9 AM to 6 PM KST.

Register now for early discounts

Don’t miss this unique opportunity to take your AI skills to the next level. Registration for the conference is free and the DLI workshops are offered at a special price of $149, or $99 if you register before August 28 (normally $500 per seat.)

For the complete list, see GTC Workshops & Training. Some workshops are available in Taiwanese, Korean, and Japanese and are scheduled in those respective time zones.

Categories
Misc

New NVIDIA Neural Graphics SDKs Make Metaverse Content Creation Available to All

A dozen tools and programs—including new releases NeuralVDB and Kaolin Wisp—make 3D content creation easy and fast for millions of designers and creators.

A dozen tools and programs—including new releases NeuralVDB and Kaolin Wisp—make 3D content creation easy and fast for millions of designers and creators.

Categories
Misc

Upcoming Webinar: Designing Efficient Vision Transformer Networks for Autonomous Vehicles

Explore design principles for efficient transformers in production and how innovative model design can help achieve better accuracy in AV perception.

Explore design principles for efficient transformers in production and how innovative model design can help achieve better accuracy in AV perception.

Categories
Misc

Top Israel Medical Center Partners with AI Startups to Help Detect Brain Bleeds, Other Critical Cases

Israel’s largest private medical center is working with startups and researchers to bring potentially life-saving AI solutions to real-world healthcare workflows. With more than 1.5 million patients across eight medical centers, Assuta Medical Centers conduct over 100,000 surgeries, 800,000 imaging tests and hundreds of thousands of other health diagnostics and treatments each year. These create Read article >

The post Top Israel Medical Center Partners with AI Startups to Help Detect Brain Bleeds, Other Critical Cases appeared first on NVIDIA Blog.

Categories
Misc

GFN Thursday Brings Thunder to the Cloud With ‘Rumbleverse’ Arriving on GeForce NOW

It’s time to rumble in Grapital City with Rumbleverse launching today on GeForce NOW. Punch your way into the all-new, free-to-play Brawler Royale from Iron Galaxy Studios and Epic Games Publishing, streaming from the cloud to nearly all devices. That means gamers can tackle, uppercut, body slam and more from any GeForce NOW-compatible device, including Read article >

The post GFN Thursday Brings Thunder to the Cloud With ‘Rumbleverse’ Arriving on GeForce NOW appeared first on NVIDIA Blog.

Categories
Misc

New Releases of NVIDIA Nsight Systems and Nsight Graphics Debut at SIGGRAPH 2022

Graphics professionals and researchers have come together at SIGGRAPH 2022 to share their expertise and learn about recent innovations in the computer graphics…

Graphics professionals and researchers have come together at SIGGRAPH 2022 to share their expertise and learn about recent innovations in the computer graphics industry. 

NVIDIA Developer Tools is excited to be a part of this year’s event, hosting the hands-on lab Using Nsight to Optimize Ray-Tracing Applications, and announcing new releases for NVIDIA Nsight Systems and NVIDIA Nsight Graphics that are available for download now.

NVIDIA Nsight Systems 2022.3

The new 2022.3 release of Nsight Systems brings expanded Vulkan API support alongside improvements to the user experience.

Nsight Systems now supports Vulkan Video, the Vulkan solution for processing hardware-accelerated video files. In previous versions of Nsight Systems, a Vulkan Video workload would not be identified as a subset of the larger queue command it occupied. 

With full integration in Nsight Systems 2022.3, Vulkan Video coding ambiguity is removed and the process can be profiled in the timeline. 

Screenshot showing that Vulkan Video workload can be identified in the Nsight System timeline below the Vulkan tab
Figure 1. Vulkan Video workload can be identified in the Nsight System timeline below the Vulkan tab

With the new VK_KHR_graphics_pipeline_library extension, Vulkan applications can now precompile shaders and link them at runtime at a substantially reduced cost. This is a critical feature for shader-heavy applications such as games, making its full support an exciting edition to Nsight Systems 2022.3. 

To round out the new version, visual improvements to multi-report viewing have been made for better clarity. For Linux machines, improved counters for the CPU, PMU, and OS make system-wide performance tracing more precise. A host of bug fixes accompany these updates.

Learn more about Nsight Systems 2022.3.

NVIDIA Nsight Graphics 2022.4

Nsight Graphics 2022.4 introduces a robust set of upgrades to its most powerful profiling tools.

In the 2022.4 release, the API inspector has been redesigned. The new design includes an improved display, search functions within API Inspector pages, significantly enhanced constant buffer views, and data export for data persistence and offline comparison.

Watch the updated demonstration video (below) from the Nsight Graphics team to learn about all the new features and improved interface:

Video 1. A demonstration of the new Nsight Graphics features and improved interface

Nsight Graphics GPU Trace is a detailed performance timeline that tracks GPU throughput, enabling meticulous profiling of hardware utilization. To aid the work of graphics development across all specifications, GPU Trace now supports generating trace and analysis for OpenGL applications on Windows and Linux.

Screenshot showing full GPU utilization timeline for an OpenGL application captured by NVIDIA Nsight Graphics
Figure 2. Full GPU utilization timeline for an OpenGL application captured by NVIDIA Nsight Graphics

Also new to GPU Trace, you can now identify subchannel switches with an event overlay. Subchannel switches occur when the GPU swaps between Compute or Graphics calls in the same hardware queue, causing the GPU to briefly idle. In the interest of performance, it is best to minimize subchannel switches, which can now be identified within the timeline.

The shader profiler summary has also been expanded, with new columns for per-shader register numbers as well as theoretical warp occupancy.

Image showing expanded shader profile summary section with new columns on the left that identify shader count and warp occupancy
Figure 3. Expanded shader profile summary section with new columns on the left that identify shader count and warp occupancy

Nsight Graphics 2022.4 is wrapped up with support for enhanced barriers that are available in recent DirectX 12 Agility SDKs. Applications that use either enhanced barriers or traditional barriers will now be equally supported. Learn more about all of the new additions to Nsight Graphics 2022.4

Nsight Deep Learning Designer 2022.2

A new version of Nsight Deep Learning Designer is available now. The 2022.2 update features expanded support for importing PyTorch models as well as launching the PyTorch exporter from a virtual environment. Performance improvements have also been made to the Channel Inspector as well as path-finding to reduce overhead. 

Paired with this release, NVIDIA Feature Map Explorer 2022.1 is available now, offering measurable performance boosts to its feature map loading process alongside additional metrics for tracking tensor values. Learn more about Nsight Deep Learning Designer 2022.2 and NVIDIA Feature Map Explorer 2022.1.

Get the latest Nsight releases

Additional resources

Watch a guided walkthrough about using Nsight tools to work through real-life development scenarios.

For even more information, see:

Want to help us build better tools for you? Share your thoughts with the NVIDIA Nsight Graphics Survey that takes less than one minute to complete. 

Categories
Misc

Reimagining Drug Discovery with Computational Biology at GTC 2022

Take a deep dive into the latest advances in drug research with AI and accelerated computing at these GTC 2022 featured sessions.

Take a deep dive into the latest advances in drug research with AI and accelerated computing at these GTC 2022 featured sessions.

Categories
Misc

Design in the Age of Digital Twins: A Conversation With Graphics-Pioneer Donald Greenberg

Asked about the future of design, Donald Greenberg holds up a model of a human aorta. “After my son became an intravascular heart surgeon at the Cleveland Clinic, he hired one of my students to use CAT scans and create digital 3D models of an aortic aneurysm,” said the computer graphics pioneer in a video Read article >

The post Design in the Age of Digital Twins: A Conversation With Graphics-Pioneer Donald Greenberg appeared first on NVIDIA Blog.

Categories
Misc

Unlocking a Simple, Extensible, and Performant Video Pipeline at Fyma with NVIDIA DeepStream

Providing computer vision in the cloud and at scale is a complex task. Fyma, a computer vision company, is tackling this complexity with the help of NVIDIA…

Providing computer vision in the cloud and at scale is a complex task. Fyma, a computer vision company, is tackling this complexity with the help of NVIDIA DeepStream

A relatively new company, Fyma turns video into data–more specifically, movement data in physical space. The Fyma platform consumes customers’ live video streams all day, every day, and produces movement events (someone walking through a doorway or down a store aisle, for example). 

One of the early lessons they learned is that their video-processing pipeline has to be simple, extensible, and performant all at the same time. With limited development resources, in the beginning they could only have one of those three. NVIDIA DeepStream has recently unlocked the ability to have all three simultaneously by shortening development times, increasing performance, and offering excellent software components such as GStreamer. 

Challenges with live video streaming

Fyma is focused on consuming live video streams to ease implementation for their customers. Customers can be hesitant to implement sensors or any additional hardware on their premises, as they have already invested in security cameras. Since these cameras can be anywhere, Fyma can provide different object detection models to maximize accuracy in different environments.

Consuming live video streams is challenging in multiple aspects:

  • Cameras sometimes produce broken video (presentation/decoding timestamps jump, reported framerate is wrong)
  • Network issues cause video streams to freeze, stutter, jump, go offline
  • CPU/memory load distribution and planning isn’t straightforward
  • Live video stream is infinite

The infinite nature of live video streams means that Fyma’s platform must perform computer vision at least as quickly as frames arrive. Basically, the whole pipeline must work in real time. Otherwise, frames would accumulate endlessly.

Luckily, object detection has steadily improved in the last few years in terms of speed and accuracy. This means being able to detect objects from more than 1,000 images per second with mAP over 90%. Such advancements have enabled Fyma to provide computer vision at scale at a reasonable price to their customers.

Providing physical space analytics using computer vision (especially in real time) involves a lot more than just object detection. According to Kaarel Kivistik, Head of Software Development at Fyma, “To actually make something out of these objects we need to track them between frames and use some kind of component to analyze the behavior as well. Considering that each customer can choose their own model, set up their own analytics, and generate reports from gathered data, a simple video processing pipeline becomes a behemoth of a platform.”

Version 1: Hello world

Fyma began with coupling OpenCV and ffmpeg to a very simple Python application. Nothing was hardware-accelerated except their neural network. They were using Yolo v3 and Darknet at the time. Performance was poor, around 50-60 frames per second, despite their use of an AWS g4dn.xlarge instance with an NVIDIA Tesla T4 GPU (which they continue to use). The application functioned like this:

  • OpenCV for capturing the video
  • Darknet with Python bindings to detect objects
  • Homemade IoU based multi-object tracker

While the implementation was fairly simple, it was not enough to scale. The poor performance was caused by three factors: 

  • Software video decoding
  • Copying decoded video frames between processes and between CPU/GPU memory
  • Software encoding the output while drawing detections on it

They worked to improve the first version with hardware video decoding and encoding. At the time, that didn’t increase overall speed by much since they still copied decoded frames from GPU to CPU memory and then back to GPU memory.

Version 2: Custom ffmpeg encoder

A real breakthrough in terms of speed came with a custom ffmpeg encoder, which was basically a wrapper around Darknet turning video frames into detected objects. Frame rates increased tenfold since they were now decoding on hardware without copying video frames between host and device memory. 

But that increase in frame rate meant that part of their application was now written in C and came with the added complexity of ffmpeg with its highly complex build system. Still, their new component didn’t need much changing and proved to be quite reliable.

One downside to this system was that they were now constrained to using Darknet.

Version 2.1: DeepSORT

To improve object tracking accuracy, Fyma replaced a homemade IoU-based tracker with DeepSORT. The results were good, but they needed to change their custom encoder to output visual features of objects in addition to bounding boxes which DeepSORT required for tracking.

Bringing in DeepSORT improved accuracy, but created another problem: depending on the video content it sometimes used a lot of CPU memory. To mitigate this problem, the team resorted to “asynchronous tracking.” Essentially a worker-based approach, it involved each worker consuming metadata consisting of bounding boxes, and producing events about object movement. While this resolved the problem of uneven CPU usage, once again it made the overall architecture more complex.

Version 3: Triton Inference Server

While previous versions performed well, Fyma found that they still couldn’t run enough cameras on each GPU. Each video stream on their platform had an individual copy of whatever model it was using. If they could reduce the memory footprint of a single camera, it would be possible to squeeze a lot more out of their GPU instances.

Fyma decided to rewrite the ffmpeg-related parts of their application. More specifically, the application now interfaces with ffmpeg libraries (libav) directly through custom Python bindings. 

This allowed Fyma to connect their application to NVIDIA Triton Inference Server which enabled sharing neural networks between camera streams. To keep the core of their object detection code the same, they moved their custom ffmpeg encoder code to a custom Triton backend.

While this solved the memory issues, it increased the complexity of Fyma’s application by at least three times.

Version 4: DeepStream

The latest version of Fyma’s application is a complete rewrite based on GStreamer and NVIDIA DeepStream. 

“A pipeline-based approach with accelerated DeepStream components is what really kicked us into gear,” Kivistik said. “Also, the joy of throwing all the previous C-based stuff into the recycle bin while not compromising on performance, it’s really incredible. We took everything that DeepStream offers: decoding, encoding, inference, tracking and analytics. We were back to synchronous tracking with a steady CPU/GPU usage thanks to nvtracker.” 

This meant events were now arriving in their database in almost real time. Previously, this data would be delayed up to a few hours, depending on how many workers were present and the general “visual” load (how many objects the whole platform was seeing).

Fyma’s current implementation runs a master process for each GPU instance. This master process in turn runs a GStreamer pipeline for each video stream added to the platform. Memory overhead for each camera is low since everything runs in a single process.

Regarding end-to-end performance (decoding, inference, tracking, analytics) Fyma is achieving frame rates up to 10x faster (around 500 fps for a single video stream) with accuracy improved up to 2-3x in comparison to their very first implementation. And Fyma was able to implement DeepStream in less than two months.

“I think we can finally say that we now have simplicity with a codebase that is not that large, and extensibility since we can easily switch out models and change the video pipeline and performance,” Kivistik said. 

“Using DeepStream really is a no-brainer for every software developer or data scientist who wants to create production-grade computer vision applications.” 

Summary

Using NVIDIA DeepStream, Fyma was able to unlock the power of its AI models and increase the performance of its vision AI applications while speeding up development time. If you would like to do the same and supercharge your development, visit the DeepStream SDK product page and DeepStream Getting Started.

Categories
Misc

AI Flying Off the Shelves: Restocking Robot Rolls Out to Hundreds of Japanese Convenience Stores

Tokyo-based startup Telexistence this week announced it will deploy NVIDIA AI-powered robots to restock shelves at hundreds of FamilyMart convenience stores in Japan. There are 56,000 convenience stores in Japan — the third-highest density worldwide. Around 16,000 of them are run by FamilyMart. Telexistence aims to save time for these stores by offloading repetitive tasks Read article >

The post AI Flying Off the Shelves: Restocking Robot Rolls Out to Hundreds of Japanese Convenience Stores appeared first on NVIDIA Blog.