NVIDIA Releases Updates to CUDA-X AI Software

NVIDIA CUDA-X AI is a deep learning software stack for researchers and software developers to build high performance GPU-accelerated applications for conversational AI, recommendation systems and computer vision.

NVIDIA CUDA-X AI is a deep learning software stack for researchers and software developers to build high performance GPU-accelerated applications for conversational AI, recommendation systems and computer vision.

Learn what’s new in the latest releases of the CUDA-X AI tools and libraries. For more information on NVIDIA’s developer tools, join live webinars, training, and Connect with the Experts sessions now on GTC On-Demand.

Refer to each package’s release notes in documentation for additional information.

NVIDIA Jarvis Open Beta 

At GTC, NVIDIA announced major capabilities to the fully accelerated conversational AI framework. It includes highly accurate automated speech recognition, real-time machine translation for multiple languages and text-to-speech capabilities to create expressive conversational AI agents.

 Highlights include:

  • Speech recognition model trained on thousands of audio hours with greater than 90% accuracy
  • Real-time machine translation for five languages that run under 100ms per sentence
  • Expressive TTS that delivers 30x higher throughput with FastPitch+HiFiGAN vs Tacotron2+WaveGlow

Also announcing BotMaker Early Access, which enables enterprises to easily integrate skills and deploy them as a bot on embedded and datacenter platforms, both offline and online.

Triton Inference Server 2.7 

At GTC, NVIDIA announced Triton Inference Server 2.9. Triton is an open source inference serving software that maximizes performance and simplifies production deployment at scale. Release updates include: 

  • Model Navigator (alpha), a new tool in Triton which automatically converts TensorFlow and PyTorch models to a TensorRT plan, validates accuracy, and sets up the deployment environment
  • Model Analyzer will now automatically determine optimal batch size and model instances to maximize performance, based on latency or throughput requirements
  • Support for OpenVINO backend (beta) for high performance inferencing on CPU, Windows Triton build (alpha), and integration with MLOps platforms: Seldon and Allegro

TensorRT 7.2 is Now Available

At GTC, NVIDIA announced TensorRT 8.0, the latest version of the high-performance deep learning inference SDK. This version includes:

  • Quantization Aware Training for FP32 accuracy with INT8 precision 
  • Sparsity support on Ampere GPUs delivers up to 50% higher throughput
  • Up to 2x faster inference for transformer based networks like BERT with new compiler optimizations

TensorRT 8.0 will be freely available to members of NVIDIA Developer Program in Q2, 2021.


NVIDIA NeMo is an open-source toolkit for developing state-of-the-art conversational AI models. 

Highlights include:

  • ASR collection: Added new state-of-the-art model architectures – CitriNet and Conformer-CTC. Also used the Mozilla Common Voice dataset and AIshell-2 corpus to add speech recognition support for multiple languages including – Mandarin, Spanish, German, French, Italian, Russian, Polish, and Catalan. 
  • NLP collection: Added ten neural machine translation language models supporting bidirectional translation between English and Spanish, Russian, Mandarin, German and French
  • TTS collection: Added support for HiFiGan, MelGan, GlowTTS, UniGlow, and SqueezeWave model architectures and pre-trained models. 

This release includes 60 additional highly-accurate models. Learn more from NeMo collections in NGC.


Maxine provides an accelerated SDK with state-of-the-art AI features for building virtual collaboration and content creation applications. At GTC, we announced AI Face Codec, a novel AI-based method from NVIDIA research to compress videos by rendering human faces for video conferencing delivering up to 10x reduction in bandwidth vs H.264.

Maxine is available now to members of the NVIDIA Developer Program. Get Started with NVIDIA Maxine.

NGC Updates (Includes Framework Updates)

The NGC catalog is a hub of GPU-optimized containers, pre-trained models, SDKs and Helm charts designed to accelerate end-to-end AI workflows. Updates include:

  • Deep Learning Frameworks
    • 21.04 containers for TensorFlow, PyTorch (v.24) and Apache MXNet (v.1.8)
    • Includes support for CUDA 11.3, cuDNN 8.2, Dali 1.0 and Ubuntu 20.04
  • Brand new UI enables users to navigate, find and download content faster than before with features such as improved search and filtering, tagged content, and direct links to all documentation on the home page.
  • TLT 3.0 provides a unified command line tool to launch commands, enables multiple Docker setup and integrates to DeepStream and Jarvis application frameworks. 
  • Magnum IO container unifies key NVIDIA technologies such as NCCL, NVSHMEM, UCX, and GDS in a single package allowing developers to build applications and run in a data center equipped with GPUs, storage and high-performance switching fabric.
  • New and Updated Partner Software
    • Matlab: The latest release highlights simplified workflows for the development of deep learning, autonomous systems, and automotive solutions.
    • Brightics AI Accelerator:Samsung SDS’ simple, fast, and automated machine learning platform.
    • Determined AI Helm Chart:An open source deep learning training platform. 
  • Plexus Satellite Container: Provides a rich set of tools for setting up and managing an isolated networked Kubernetes cluster on the Core Scientific Plexus software stack.

cuDNN 8.2 GA

The NVIDIA CUDA Deep Neural Network library (cuDNN) is a GPU-accelerated library of primitives for accelerating training and inference applications. This version includes:

  • BFloat16 support for CNNs on NVIDIA Ampere architecture GPUs
  • Speed up CNNs by fusing convolution operators, point-wise operations and runtime reductions. 
  • Faster out-of-box performance with new dynamic kernel selection infrastructure
  • Up to 2X higher RNN performance with new optimizations and heuristics


The NVIDIA Data Loading Library (DALI) is an open-source GPU-accelerated library for fast  pre-processing of images, videos and audio to accelerate deep learning workflows. This version includes:

  • New Functional API for simpler pipeline creation and ease-of-use
  • Easy integration with Triton Inference Server with DALI Backend
  • New GPU-accelerated operators for image, video and audio processing

Leave a Reply

Your email address will not be published. Required fields are marked *