Categories
Misc

World’s Leading Electronics Manufacturers Adopt NVIDIA Generative AI and Omniverse to Digitalize State-of-the-Art Factories

NVIDIA today announced that electronics manufacturers worldwide are advancing their industrial digitalization efforts using a new, comprehensive reference workflow that combines NVIDIA technologies for generative AI, 3D collaboration, simulation and autonomous machines.

Categories
Misc

Live From Taipei: NVIDIA CEO Unveils Gen AI Platforms for Every Industry

In his first live keynote since the pandemic, NVIDIA founder and CEO Jensen Huang today kicked off the COMPUTEX conference in Taipei, announcing platforms that companies can use to ride a historic wave of generative AI that’s transforming industries from advertising to manufacturing to telecom. Speaking to a packed house of some 3,500, he described Read article >

Categories
Misc

MediaTek Partners With NVIDIA to Transform Automobiles With AI and Accelerated Computing

MediaTek, a leading innovator in connectivity and multimedia, is teaming with NVIDIA to bring drivers and passengers new experiences inside the car. The partnership was announced today at a COMPUTEX press conference with MediaTek CEO Rick Tsai and NVIDIA founder and CEO Jensen Huang. “NVIDIA is a world-renowned pioneer and industry leader in AI and Read article >

Categories
Misc

NVIDIA RTX Transforming 14-Inch Laptops, Plus Simultaneous Screen Encoding and May Studio Driver Available Today

New 14-inch NVIDIA Studio laptops, equipped with GeForce RTX 40 Series Laptop GPUs, give creators peak portability with a significant increase in performance over the last generation.

Categories
Misc

Transferring Industrial Robot Assembly Tasks from Simulation to Reality

A side-by-side simulated and video version of robotics hands assembling pieces.Simulation is an essential tool for robots learning new skills. These skills include perception (understanding the world from camera images), planning…A side-by-side simulated and video version of robotics hands assembling pieces.

Simulation is an essential tool for robots learning new skills. These skills include perception (understanding the world from camera images), planning (formulating a sequence of actions to solve a problem), and control (generating motor commands to change a robot’s position and orientation). 

Robotic assembly is ubiquitous in the automotive, aerospace, electronics, and medical device industries. Setting up robots to perform an assembly task is a time-consuming and expensive process, requiring a team to engineer the robot’s trajectories and constrain its surroundings carefully. 

In other areas of robotics, simulation has become an indispensable tool, especially for the development of AI. However, robotic assembly involves high-precision contact between geometrically complex, tight-tolerance parts. Simulating these contact-rich interactions has long been viewed as computationally intractable.

With recent developments from NVIDIA advancing robotic assembly, faster-than-realtime simulation is possible. These high-speed simulations enable the use of powerful, state-of-the-art techniques in reinforcement learning (RL). With RL, virtual robots explore simulated environments, gain years of experience, and learn useful skills through intelligent trial-and-error. Using RL for robotic assembly minimizes the need for human expertise, increases robustness to variation, and reduces hardware wear and tear. The term sim-to-real refers to the transfer of skills from simulation to the real world.

One of the biggest challenges in using RL for robotic assembly is that skills learned by robots in simulation do not typically transfer well to real-world robots. Subtle discrepancies in physics, motor signals, and sensor signals between the simulator and the real world cause this issue. Moreover, a real-world robot might encounter scenarios never seen in the simulator. These issues are collectively known as the reality gap.

What is IndustReal?

To use RL for challenging assembly tasks and address the reality gap, we developed IndustReal. IndustReal is a set of algorithms, systems, and tools for robots to solve assembly tasks in simulation and transfer these capabilities to the real world.  

IndustReal’s primary contributions include:

  • A set of algorithms for simulated robots to solve complex assembly tasks with RL.
  • A method that addresses the reality gap and stabilizes the learned skills when deployed in the real world.
  • A real-world robotic system that performs sim-to-real transfer of simulation-trained assembly skills from end-to-end.
  • A hardware and software toolkit for researchers and engineers to reproduce the system.
    • IndustRealKit is a set of 3D-printable CAD models of assets inspired by NIST Task Board 1, the established benchmark for robotic assembly.
    • IndustRealLib is a lightweight Python library that deploys skills learned in the NVIDIA Isaac Gym simulator onto a real-world Franka Emika Panda robot arm.
Robot inserts pegs and assembles gears in simulation and the real world.
Figure 1. Robot executes simulation-based policies for inserting pegs and assembling gears (top row.) Real-world deployments of these policies (bottom row)

Training algorithms and deployment method

In this work, we propose three algorithms to help learn assembly skills using RL in simulation. We also propose a deployment method for executing the skills on a real-world robot. 

Simulation-aware policy update 

Robotics simulators like NVIDIA Isaac Gym and NVIDIA Isaac Sim simulate real-world physics while simultaneously satisfying many physical constraints–most importantly, that objects cannot overlap with one another, or, interpenetrate. In most simulators, small interpenetrations between objects are unavoidable, especially when executing in real time. 

We introduce the simulation-aware policy update (SAPU) that provides the simulated robot with knowledge of when simulation predictions are reliable or unreliable. Specifically, in SAPU, we implement a GPU-based module in NVIDIA Warp that checks for interpenetrations as the robot is learning how to assemble parts using RL. 

We weight the robot’s simulated experience more when interpenetrations are small, and less when interpenetrations are large. This strategy prevents a simulated robot from exploiting inaccurate physics to solve tasks, which would cause it to learn skills that are unlikely to transfer to the real world.

Signed distance field reward

To solve tasks with RL, a reward signal (such as measuring how much progress the robot has made on solving the task) must be defined. However, it is challenging to define a reward signal for the alignment of geometrically complex parts during an assembly process.

We introduce a signed distance field (SDF) reward to measure how closely simulated parts are aligned during the assembly process. An SDF is a mathematical function that can take points on one object and compute the shortest distances to the surface of another object. It provides a natural and general way to describe alignment between parts, even when they are highly symmetric or asymmetric. 

In the SDF reward, we define our reward signal as the SDF distance between the current position and the target position of a part during the assembly process.

“An
Figure 2. Visualization of 2D slices of an SDF for a round peg. The color represents the shortest distance from the given point to the surface of the peg

Sampling-based curriculum 

Curriculum learning is an established approach in RL for solving problems that involve many individual steps or motions; as the robot learns, the difficulty of the task is gradually increased.

In our assembly tasks, the robot begins by solving simple assembly problems (that is, where the parts are partially assembled), before progressing to harder problems (that is, where the parts are disassembled). 

As the initial engagement between parts is gradually reduced, there comes a point where the parts no longer begin in contact. This sudden increase in difficulty can lead to a performance collapse as the robot’s knowledge has become too specialized towards the partially assembled configurations.

We introduce a sampling-based curriculum (SBC) for a simulated robot to learn a complex assembly task gradually. We ask the robot to solve assembly problems sampled across the entire difficulty range during all stages of the curriculum. However, we gradually remove the easiest problems from the problem distribution. At the final stage of the curriculum, the parts never begin in contact. See the following visualization.

Points distributed inside and outside a receptacle.
Figure 3. Different stages of a sampling-based curriculum. From left to right, the difficulty of the task increases as the distribution of the initial positions of the plug (yellow spheres) shifts away from the receptacle (beige)

Policy-level action integrator 

In the most common applications of RL to robotics, the robot generates actions that are incremental adjustments to its pose (that is, its position and orientation). These increments are applied to the robot’s current pose to produce an instantaneous target pose. With real-world robots, this strategy can lead to discrepancies between the robot’s final pose and its final target pose due to the complexities of the physical robot. 

We also propose a policy-level action integrator (PLAI), a simple algorithm that reduces steady-state (that is, long-term) errors when deploying a learned skill on a real-world robot. We apply the incremental adjustments to the previous instantaneous target pose to produce the new instantaneous target pose. 

Mathematically (akin to the integral term of a classical PID controller), this strategy generates an instantaneous target pose that is the sum of the initial pose and the actions generated by the robot over time. This technique can minimize errors between the robot’s final pose and its final target pose, even in the presence of physical complexities.

We compare the performance of a standard (nominal) strategy, our PLAI algorithm, and a classical PID controller on a reach task, where the robot is trying to move to a target position. See the following visualization.

Animations of a robot arm moving toward various target positions.
Figure 4. Comparison of a robot using nominal, PLAI algorithm, and PID strategies for moving its fingertips to the pink sphere target. (Top row: comparison in the presence of imperfect gravity compensation. Bottom row: comparison in the presence of unmodeled friction at the joints)

Systems and tools

The setup used for the real-world experiments conducted in IndustReal includes a Franka Emika Panda robot arm with an Intel RealSense D435 camera mounted on its hand and an assembly platform with parts.

A robot arm overlooking a set of mechanical parts.
Figure 5. Physical robot setup: A Franka Emika Panda robot arm with an Intel RealSense D435 camera on its hand, overlooking an assembly platform with parts

IndustReal provides hardware (IndustRealKit) and software (IndustRealLib) for reproducing the system presented in the paper.

IndustRealKit contains 3D-printable 20-part CAD models for all parts used in this work. The models come with six peg holders, six peg sockets, three gears, one gear base (with three gear shafts), and four NEMA connectors and receptacle holders, which are standard plugs and power outlets used in the United States. 

The purchasing list includes 17 parts: six metal pegs (from the NIST benchmark), four NEMA connectors and receptacles, one optical platform, and fasteners.

An image of the IndustRealKit.
Figure 6. The IndustRealKit

IndustRealLib is a lightweight library containing code for deploying skills learned in simulation through RL onto a real-world robot arm. Specifically, we provide scripts for users to deploy control policies (that is, neural networks that map sensor signals to robot actions) trained in the NVIDIA Isaac Gym simulator onto a Franka Emika Panda robot quickly.

Future direction

IndustReal shows a path toward leveraging the full potential of simulation in robotic assembly tasks. As simulation becomes more accurate and efficient, and additional sim-to-real transfer techniques are developed, we foresee numerous possibilities of expanding this work to other tasks in manufacturing (such as screw fastening, cable routing, and welding). It is reasonable to believe that one day every advanced industrial manufacturing robot will be trained in simulation using such techniques, for seamless and scalable transfer to the real world.

Our next steps are to expand the system to include more objects, assembly tasks, and complex environments. We also aim to develop additional sim-to-real techniques for the smooth transfer of learned skills at lower cost, higher reliability, and guaranteed safety.

Get started with IndustReal

Paper authors Bingjie Tang, Michael A. Lin, Iretiayo Akinola, Ankur Handa, Gaurav S. Sukhatme, Fabio Ramos, Dieter Fox, and Yashraj Narang will present their research “IndustReal: Transferring Industrial Assembly Tasks from Simulation to Reality” at the Robotics: Science and Systems (RSS) conference in July 2023. 

Categories
Misc

Getting Started with NVIDIA NVUE API

Two people working at a computer together.Learn how NVIDIA NVUE API automates data center network operations with sample code for Curl commands, Python Code, and NVUE CLI.Two people working at a computer together.

Learn how NVIDIA NVUE API automates data center network operations with sample code for Curl commands, Python Code, and NVUE CLI.

Categories
Misc

NVIDIA AX800 Delivers High-Performance 5G vRAN and AI Services on One Common Cloud Infrastructure

NVIDIA AX800The pace of 5G investment and adoption is accelerating. According to the GSMA Mobile Economy 2023 report, nearly $1.4 trillion will be spent on 5G CAPEX,…NVIDIA AX800

The pace of 5G investment and adoption is accelerating. According to the GSMA Mobile Economy 2023 report, nearly $1.4 trillion will be spent on 5G CAPEX, between 2023 and 2030. Radio access network (RAN) may account for over 60% of the spend. 

Increasingly, the CAPEX spend is moving from the traditional approach with proprietary hardware, to virtualized RAN (vRAN) and Open RAN architectures that can benefit from cloud economics and do not require dedicated hardware. Despite these benefits, Open RAN adoption is struggling because existing technology has yet to deliver the benefits of cloud economics, and cannot deliver high performance and flexibility at the same time. 

NVIDIA has overcome these challenges with the NVIDIA AX800 converged accelerator, delivering a truly cloud-native and high-performance accelerated 5G solution on commodity hardware that can run on any cloud (Figure 1).

To benefit from cloud economics, the future of the RAN is in the cloud (RAN-in-the-Cloud). The road to cloud economics aligns with Clayton Christensen’s characterization of disruptive innovation in traditional industries as presented in his book, The Innovator’s Dilemma: When New Technologies Cause Great Firms to Fail. That is, with progressive incremental improvements, new and seemingly inferior products are able to eventually capture market share. 

Existing Open RAN solutions currently cannot support non-5G workloads and still deliver inferior 5G performance. They are mostly still using single-use hardware accelerators. This limits their appeal to telecom executives, as the comparative performance of traditional solutions delivers a tried-and-tested deployment plan for 5G. 

However, the NVIDIA Accelerated 5G RAN solution based on NVIDIA AX800 has overcome these limitations and is now delivering comparable performance to traditional 5G solutions. This paves the way to deploy 5G Open RAN on commercial-off-the-shelf (COTS) hardware at any public cloud or telco edge.

Graphic explaining Open RAN ecosystem: Proprietary Systems, Open RAN 2012 to 2022, NVIDIA Accelerated RAN 2023
Figure 1. NVIDIA AX800 converged accelerator and NVIDIA Aerial SDK deliver a compelling advantage for the Open RAN ecosystem

Solutions to support cloud-native RAN 

To drive broad adoption of cloud-native RAN, the industry needs solutions that are cloud native, deliver high RAN performance, and are built with AI capability.

Cloud native

This approach delivers better utilization, multi-use and multi-tenancy, lower TCO, and increased automation—with all the virtues of cloud computing and benefiting from cloud economics. 

A cloud-native network that benefits from cloud economics requires a complete rethink in approach to deliver a network that is 100% software-defined, deployed on general-purpose hardware and can support multi-tenancy. As such, it is not about building bespoke and dedicated systems in the public or telco cloud managed by Cloud Service Providers (CSPs). 

High RAN performance

High RAN performance is required to deliver new technology—such as massive MIMO with its improved spectral efficiency, cell density, and higher throughput—all with improved energy efficiency. Achieving high performance on commodity hardware that is comparable to the performance of dedicated systems is proving a formidable challenge. This is due to the death of Moore’s Law and the relatively low performance achieved by software running on CPUs. 

As a result, RAN vendors are building fixed-function accelerators to improve the CPU performance. This approach leads to inflexible solutions and does not meet the flexibility and openness expectations for Open RAN. In addition, with fixed-function or single-use accelerators, the benefits of cloud economics cannot be achieved.

For example, software-defined 5G networks that are based on Open RAN specifications and COTS hardware are achieving typical peak throughput of ~10 Gbps compared to >30 Gbps peak throughput performance on 5G networks that are built in the traditional, vertically integrated, appliance approach using bespoke software and hardware. 

According to a recent survey of 52 telco executives reported in Telecom Networks: Tracking the Coming xRAN Revolution, “In terms of obstacles to xRAN deployment, 62% of operators voice concerns regarding xRAN performance today relative to traditional RAN.”

AI capability 

Solutions must evolve from the current application, based on proprietary implementation in the telecom network, toward an AI-capable infrastructure for hosting internal and external applications. AI plays a role in 5G (AI-for-5G) to automate and improve system performance. Likewise, AI plays a role, together with 5G (AI-on-5G), to enable new features in 5G and beyond. 

Achieving these goals requires a new architectural approach for cloud-native RAN, especially with a general-purpose COTS-based accelerated computing platform. This is the NVIDIA focus, as summarized in Figure 2. 

The emphasis is on delivering a general-purpose COTS server built with NVIDIA converged accelerators (such as the NVIDIA AX800) that can support high-performance 5G and AI workloads on the same platform. This will deliver cloud economics with better utilization and reduced TCO, and a platform that can efficiently run AI workloads to future proof the RAN for the 6G era.

Figure shows three focus areas where NVIDIA is delivering technological innovations to transform the RAN: efficiency, utilization, and performance.
Figure 2. NVIDIA technological innovations transforming the RAN

Run 5G and AI workloads on the same accelerator with NVIDIA AX800

The NVIDIA AX800 converged accelerator is a game changer for CSPs and telcos because it brings cloud economics into the operations and management of telecom networks. The AX800 supports multi-use and multi-tenancy of both 5G and AI workloads on commodity hardware, that can run on any cloud, by dynamically scaling the workloads. In doing so, it enables CSPs and telcos to use the same infrastructure for both 5G and AI with high utilization levels. 

Dynamic scaling for multi-tenancy

The NVIDIA AX800 achieves dynamic scaling, both at the data center and at the server and card levels, enabling support of 5G and AI workloads. This scalable, flexible, energy-efficient, and cost-effective approach can deliver a variety of applications and services.

At the data center and server levels, the NVIDIA AX800 supports dynamic scaling. The Open RAN service and management orchestration (SMO) is able to allocate and reallocate computational resources in real time to support either 5G or AI workloads.

At the card level, NVIDIA AX800 supports dynamic scaling using NVIDIA Multi-Instance GPU (MIG), as shown in Figure 3. MIG enables concurrent processing of virtualized 5G base stations and edge AI applications on pooled GPU hardware resources. This enables each function to run on the same server and accelerator in a coherent and energy-conscientious manner. 

This novel approach provides increased radio capacity and processing power, contributing to better performance and enabling peak data throughput processing with room for future antenna technology advancements.

Diagram showing how the NVIDIA Multi-Instance GPU mechanism redistributes AI and 5G workloads on a GPU
Figure 3. NVIDIA Multi-Instance GPU redistribution based on workload and resource requirements

Commercial implications of dynamic scaling for multi-tenancy

The rationale for pooling 5G RAN in the cloud (RAN-in-the-Cloud) is straightforward. The RAN constitutes the largest CAPEX and OPEX spending for telcos (>60%). Yet the RAN is also the most underutilized resource, with most radio base stations typically operating below 50%. 

Moving RAN compute into the cloud brings all the benefits of cloud computing: pooling and higher utilization in a shared cloud infrastructure resulting in the largest CAPEX and OPEX reduction for telcos. It also supports cloud-native scale in scale out and dynamic resource management.

Dynamic scaling for multi-tenancy is commercially significant in three ways. First, it enables deployment of 5G and AI on general-purpose computing hardware, paving the way to run the 5G RAN on any cloud, whether on the public cloud or the telco edge cloud (telco base station). As all general computing workloads migrate to the cloud, it is clear that the future of the RAN will also be in the cloud. NVIDIA is a leading industry voice to realize this vision, as detailed in RAN-in-the-Cloud: Delivering Cloud Economics to 5G RAN.

Second, dynamic scaling leverages cloud economics to deliver ROI improvements to telecom networks. Instead of the typical TCO challenges with single-use solutions, multi-tenancy enables the same infrastructure to be used for multiple workloads, hence increasing utilization. 

Telcos and enterprises are already using the cloud for mixed workloads, which are spike-sensitive, expensive, and consist of many one-off “islands.” Likewise, telcos and enterprises are increasingly using NVIDIA GPU servers to accelerate edge AI applications. The NVIDIA AX800 provides an easy path to use the same GPU resources for accelerating the 5G RAN connectivity, in addition to edge AI applications. 

Third, the opportunity for dynamic scaling using NVIDIA AX800 provides marginal utility to telcos and CSPs who are already investing in NVIDIA systems and solutions to power their AI (especially generative AI) services. 

Current demand for NVIDIA compute, especially to support generative AI applications, is significantly high. As such, once the investment is made, deriving additional marginal utility from running 5G and generative AI applications together massively accelerates the ROI on NVIDIA accelerated compute. 

An image of the NVIDIA RAN-in-the-Cloud vision showing the architectural building blocks
Figure 4. NVIDIA RAN-in-the-Cloud building blocks

NVIDIA AX800 delivers performance improvements for software-defined 5G

The NVIDIA AX800 converged accelerator delivers 36 Gbps throughput on a 2U server, when running NVIDIA Aerial 5G vRAN, substantially improving the performance for a software-defined, commercially available Open RAN 5G solution. 

This is a significant performance improvement over the typical peak throughput of ~10 Gbps of existing Open RAN solutions. It compares favorably with the >30 Gbps peak throughput performance on 5G networks that are built in the traditional way. It achieves this today by accelerating the physical layer 1 (L1) stack in the NVIDIA Aerial 5G vRAN (Figure 5). Further performance breakthroughs are in the pipeline as the NVIDIA AX800 can be leveraged for the full 5G stack in the near future (Figure 6).

The NVIDIA AX800 converged accelerator combines NVIDIA Ampere architecture GPU technology with the NVIDIA BlueField-3 DPU. It has nearly 1 TB/s of GPU memory bandwidth and can be partitioned into as many as seven GPU instances. NVIDIA BlueField-3 supports 256 threads, making the NVIDIA AX800 capable of high performance on the most demanding I/O-intensive workloads, such as L1 5G vRAN.

NVIDIA AX800 with NVIDIA Aerial together deliver this performance for 10 peak 4T4R cells on TDD at 100 MHz using four downlink (DL) and two uplink (UL) layers and 100% physical resource block (PRB) utilization. This enables the system to achieve 36.56 Gbps and 4.794 Gbps DL and UL throughput, respectively.

The NVIDIA solution is also highly scalable and can support from 2T2R (sub 1 GHz macro deployments) to 64T64R (massive MIMO deployments) configurations. Massive MIMO workloads with high layer counts are dominated by the computational complexity of algorithms for estimating and responding to channel conditions (for example, sounding reference signal channel estimator, channel equalizer, beamforming, and more). 

The GPU, and specifically the AX800 (with the highest streaming multiprocessor count for NVIDIA Ampere architecture GPUs), offers the ideal solution to tackle the complexity of massive MIMO workloads at moderate power envelopes.

Diagram showing how the NVIDIA AX800 accelerates Layer 1 of the NVIDIA Aerial 5G vRAN stack.
Figure 5. The NVIDIA AX800 converged accelerator delivers performance breakthroughs by accelerating the physical layer (Layer 1) of the NVIDIA Aerial 5G vRAN stack
Diagram showing how acceleration of the full 5G stack will deliver improved performance for NVIDIA Aerial 5G vRAN.
Figure 6. The NVIDIA AX800 converged accelerator with 5G vRAN full-stack acceleration will extend the performance improvements of the NVIDIA Aerial 5G vRAN stack

Summary

The NVIDIA AX800 converged accelerator offers a new architectural approach to deploying 5G on commodity hardware on any cloud. It delivers a throughput performance improvement of 36 Gbps for software-defined 5G using the NVIDIA Aerial 5G vRAN stack. 

NVIDIA AX800 brings the vision of the RAN-in-the-Cloud closer to reality, offering telcos and CSPs a roadmap to move 5G RAN workloads into any cloud. There they can be dynamically combined with other AI workloads to improve infrastructure utilization, optimize TCO, and boost ROI. Likewise, the throughput improvement dramatically boosts the performance of Open RAN solutions, making them competitive with traditional 5G RAN options. 

NVIDIA is working with CSPs, telcos, and OEMs to deploy the NVIDIA AX800 in commercial 5G networks. For more information, visit AI Solutions for Telecom.

Categories
Misc

Generative AI Sparks Life into Virtual Characters with NVIDIA ACE for Games

Game NPC scene in ramen shopGenerative AI technologies are revolutionizing how games are conceived, produced, and played. Game developers are exploring how these technologies impact 2D and…Game NPC scene in ramen shop

Generative AI technologies are revolutionizing how games are conceived, produced, and played. Game developers are exploring how these technologies impact 2D and 3D content-creation pipelines during production. Part of the excitement comes from the ability to create gaming experiences at runtime that would have been impossible using earlier solutions.

The creation of non-playable characters (NPCs) has evolved as games have become more sophisticated. The number of pre-recorded lines has grown, the number of options a player has to interact with NPCs has increased, and facial animations have become more realistic. 

Yet player interactions with NPCs still tend to be transactional, scripted, and short-lived, as dialogue options exhaust quickly, serving only to push the story forward. Now, generative AI can make NPCs more intelligent by improving their conversational skills, creating persistent personalities that evolve over time, and enabling dynamic responses that are unique to the player.

At COMPUTEX 2023, NVIDIA announced the future of NPCs with the NVIDIA Avatar Cloud Engine (ACE) for Games. NVIDIA ACE for Games is a custom AI model foundry service that aims to transform games by bringing intelligence to NPCs through AI-powered natural language interactions. 

Developers of middleware, tools, and games can use NVIDIA ACE for Games to build and deploy customized speech, conversation, and animation AI models in software and games.

Generate NPCs with the latest breakthroughs in AI foundation models

Graphic showing modules of NVIDIA ACE for Games.
Figure 1. Use NVIDIA ACE for Games to customize and deploy LLMs through cloud or PC to generate intelligent NPCs 

The optimized AI foundation models include the following:

  • NVIDIA NeMo: Provides foundation language models and model customization tools so you can further tune the models for game characters. The models can be integrated end-to-end or in any combination, depending on need. This customizable large language model (LLM) enables specific character backstories and personalities that fit the game world.
  • NVIDIA Riva: Provides automatic speech recognition (ASR) and text-to-speech (TTS) capabilities to enable live speech conversation with NVIDIA NeMo.
  • NVIDIA Omniverse Audio2Face: Instantly creates expressive facial animation for game characters from just an audio source. Audio2Face features Omniverse connectors for Unreal Engine 5, so you can add facial animation directly to MetaHuman characters.

You can bring life to NPCs through NeMo model alignment techniques. First, employ behavior cloning to enable the base language model to perform role-playing tasks according to instructions. To further align the NPC’s behavior with expectations, in the future, you can apply reinforcement learning from human feedback (RLHF) to receive real-time feedback from designers during the development process.

After the NPC is fully aligned, the final step is to apply NeMo Guardrails, which adds programmable rules for NPCs. This toolkit assists you in building accurate, appropriate, on-topic, and secure game characters. NeMo Guardrails natively supports LangChain, a toolkit for developing LLM-powered applications.

NVIDIA offers flexible deployment methods for middleware, tools, and game developers of all sizes. The neural networks enabling NVIDIA ACE for Games are optimized for different capabilities, with various size, performance, and quality trade-offs.

The ACE for Games foundry service will help you fine-tune models for your games and then deploy them through NVIDIA DGX Cloud, GeForce RTX PCs, or on-premises for real-time inferencing. You can also validate the quality of the models in real time and test performance and latency to ensure that they meet specific standards before deployment.

Create end-to-end avatar solutions for games

To showcase how you can leverage ACE for Games to build NPCs, NVIDIA partnered with Convai, a startup building a platform for creating and deploying AI characters in games and virtual worlds, to help optimize and integrate ACE modules into their offering. 

“With NVIDIA ACE for Games, Convai’s tools can achieve the latency and quality needed to make AI non-playable characters available to nearly every developer in a cost-efficient way,” said Purnendu Mukherjee, founder and CEO at Convai.

Convai used NVIDIA Riva for speech-to-text and text-to-speech capabilities, NVIDIA NeMo for the LLM that drives the conversation, and Audio2Face for AI-powered facial animation from voice inputs.

Video 1. NVIDIA Kairos demo showcases Jin, an immersive NPC, and a ramen shop built with the latest NVIDIA RTX and NVIDIA DLSS technologies

As shown in Video 1, these modules were integrated seamlessly into the Convai services platform and fed into Unreal Engine 5 and MetaHuman to bring the immersive NPC Jin to life. The ramen shop scene, created by the NVIDIA Lightspeed Studios art team, runs in the NVIDIA RTX Branch of Unreal Engine 5 (NvRTX 5.1). The scene is rendered using RTX Direct Illumination (RTXDI) for ray-traced lighting and shadows alongside NVIDIA DLSS 3 for maximum performance.

Game developers are already using existing NVIDIA generative AI technologies for game development:

  • GSC Game World, one of Europe’s leading game developers, is adopting Audio2Face in its upcoming game, S.T.A.L.K.E.R. 2: Heart of Chornobyl.
  • Fallen Leaf, an indie game developer, is also using Audio2Face for character facial animation in Fort Solis, a third-person sci-fi thriller game that takes place on Mars.
  • Generative AI-focused companies such as Charisma.ai are leveraging Audio2Face to power the animation in their conversation engine.

Subscribe to learn more about NVIDIA ACE for Games, future developments, and early access programs. For more information about their features, use cases, and integrations, see Convai. For more information about integrating NVIDIA RTX and AI technologies into games, see NVIDIA Game Development Resources.

Categories
Misc

Announcing NVIDIA DGX GH200: The First 100 Terabyte GPU Memory System

At COMPUTEX 2023, NVIDIA announced NVIDIA DGX GH200, which marks another breakthrough in GPU-accelerated computing to power the most demanding giant AI…

At COMPUTEX 2023, NVIDIA announced NVIDIA DGX GH200, which marks another breakthrough in GPU-accelerated computing to power the most demanding giant AI workloads. In addition to describing critical aspects of the NVIDIA DGX GH200 architecture, this post discusses how NVIDIA Base Command enables rapid deployment, accelerates the onboarding of users, and simplifies system management.

The unified memory programming model of GPUs has been the cornerstone of various breakthroughs in complex accelerated computing applications over the last 7 years. In 2016, NVIDIA introduced NVLink technology and the Unified Memory Programming model with CUDA-6, designed to increase the memory available to GPU-accelerated workloads. 

Since then, the core of every DGX system is a GPU complex on a baseboard interconnected with NVLink in which each GPU can access the other’s memory at NVLink speed. Many such DGX with GPU complexes are interconnected with high-speed networking to form larger supercomputers such as the NVIDIA Selene supercomputer. Yet an emerging class of giant, trillion-parameter AI models will require either several months to train or cannot be solved even on today’s best supercomputers. 

To empower the scientists in need of an advanced platform that can solve these extraordinary challenges, NVIDIA paired NVIDIA Grace Hopper Superchip with the NVLink Switch System, uniting up to 256 GPUs in an NVIDIA DGX GH200 system. In the DGX GH200 system, 144 terabytes of memory will be accessible to the GPU shared memory programming model at high speed over NVLink. 

Compared to a single NVIDIA DGX A100 320 GB system, NVIDIA DGX GH200 provides nearly 500x more memory to the GPU shared memory programming model over NVLink, forming a giant data center-sized GPU. NVIDIA DGX GH200 is the first supercomputer to break the 100-terabyte barrier for memory accessible to GPUs over NVLink.

Linear graph illustrating the gains made in GPU memory as a result of NVLink technology progression.
Figure 1. GPU memory gains as a result of NVLink progression 

NVIDIA DGX GH200 system architecture

NVIDIA Grace Hopper Superchip and NVLink Switch System are the building blocks of NVIDIA DGX GH200 architecture. NVIDIA Grace Hopper Superchip combines the Grace and Hopper architectures using NVIDIA NVLink-C2C to deliver a CPU + GPU coherent memory model. The NVLink Switch System, powered by the fourth generation of NVLink technology, extends NVLink connection across superchips to create a seamless, high-bandwidth, multi-GPU system.

Each NVIDIA Grace Hopper Superchip in NVIDIA DGX GH200 has 480 GB LPDDR5 CPU memory, at eighth of the power per GB, compared with DDR5 and 96 GB of fast HBM3. NVIDIA Grace CPU and Hopper GPU are interconnected with NVLink-C2C, providing 7x more bandwidth than PCIe Gen5 at one-fifth the power. 

NVLink Switch System forms a two-level, non-blocking, fat-tree NVLink fabric to fully connect 256 Grace Hopper Superchips in a DGX GH200 system. Every GPU in DGX GH200 can access the memory of other GPUs and extended GPU memory of all NVIDIA Grace CPUs at 900 GBps. 

Compute baseboards hosting Grace Hopper Superchips are connected to the NVLink Switch System using a custom cable harness for the first layer of NVLink fabric. LinkX cables extend the connectivity in the second layer of NVLink fabric. 

Diagram illustrating the topology of a fully connected NVIDIA NVLink Switch System across NVIDIA DGX GH200 consisting of 256 GPUs: 36 NVLink switches.
Figure 2. Topology of a fully connected NVIDIA NVLink Switch System across NVIDIA DGX GH200 consisting of 256 GPUs

In the DGX GH200 system, GPU threads can address peer HBM3 and LPDDR5X memory from other Grace Hopper Superchips in the NVLink network using an NVLink page table. NVIDIA Magnum IO acceleration libraries optimize GPU communications for efficiency, enhancing application scaling with all 256 GPUs. 

Every Grace Hopper Superchip in DGX GH200 is paired with one NVIDIA ConnectX-7 network adapter and one NVIDIA BlueField-3 NIC. The DGX GH200 has 128 TBps bi-section bandwidth and 230.4 TFLOPS of NVIDIA SHARP in-network computing to accelerate collective operations commonly used in AI and doubles the effective bandwidth of the NVLink Network System by reducing the communication overheads of collective operations.

For scaling beyond 256 GPUs, ConnectX-7 adapters can interconnect multiple DGX GH200 systems to scale into an even larger solution. The power of BlueField-3 DPUs transforms any enterprise computing environment into a secure and accelerated virtual private cloud, enabling organizations to run application workloads in secure, multi-tenant environments.

Target use cases and performance benefits

The generational leap in GPU memory significantly improves the performance of AI and HPC applications bottlenecked by GPU memory size. Many mainstream AI and HPC workloads can reside entirely in the aggregate GPU memory of a single NVIDIA DGX H100. For such workloads, the DGX H100 is the most performance-efficient training solution.

Other workloads—such as a deep learning recommendation model (DLRM) with terabytes of embedded tables, a terabyte-scale graph neural network training model, or large data analytics workloads—see speedups of 4x to 7x with DGX GH200. This shows that DGX GH200 is a better solution for the more advanced AI and HPC models requiring massive memory for GPU shared memory programming.

The mechanics of speedup are described in detail in the NVIDIA Grace Hopper Superchip Architecture whitepaper.

Bar graph compares the performance gains between an NVIDIA DGX H100 Cluster with NVIDIA InfiniBand and an NVIDIA DGX GH200 with NVLink Switch System when applied to large AI models that impose giant memory demands for particular workloads, including emerging NLP, larger recommender systems, graph neural networks, graph analytics and data analytics workloads.
Figure 3. Performance comparisons for giant memory AI workloads

Purpose-designed for the most demanding workloads

Every component throughout DGX GH200 is selected to minimize bottlenecks while maximizing network performance for key workloads and fully utilizing all scale-up hardware capabilities. The result is linear scalability and high utilization of the massive, shared memory space. 

To get the most out of this advanced system, NVIDIA also architected an extremely high-speed storage fabric to run at peak capacity and to handle a variety of data types (text, tabular data, audio, and video)—in parallel and with unwavering performance. 

Full-stack NVIDIA solution

DGX GH200 comes with NVIDIA Base Command, which includes an OS optimized for AI workloads, cluster manager, libraries that accelerate compute, storage, and network infrastructure are optimized for DGX GH200 system architecture. 

DGX GH200 also includes NVIDIA AI Enterprise, providing a suite of software and frameworks optimized to streamline AI development and deployment. This full-stack solution enables customers to focus on innovation and worry less about managing their IT infrastructure.

Diagram illustrates the full stack of software and software platforms that are included with the NVIDIA DGX GH200 AI supercomputer. The stack includes NVIDIA AI Enterprise software suite for developers, NVIDIA Base Command OS platform that includes AI workflow management, enterprise-grade cluster management, libraries that accelerate compute, storage, and network infrastructure, and system software optimized for running AI workloads.
Figure 4. The NVIDIA DGX GH200 AI supercomputer full stack includes NVIDIA Base Command and NVIDIA AI Enterprise

Supercharge giant AI and HPC workloads

NVIDIA is working to make DGX GH200 available at the end of this year. NVIDIA is eager to provide this incredible first-of-its-kind supercomputer and empower you to innovate and pursue your passions in solving today’s biggest AI and HPC challenges. Learn more.

Categories
Misc

Step into the Future of Industrial-Grade Edge AI with NVIDIA Jetson AGX Orin Industrial 

Picture of NVIDIA Jetson AGX Orin Industrial SoM on a black background.Embedded edge AI is transforming industrial environments by introducing intelligence and real-time processing to even the most challenging settings. Edge AI is…Picture of NVIDIA Jetson AGX Orin Industrial SoM on a black background.

Embedded edge AI is transforming industrial environments by introducing intelligence and real-time processing to even the most challenging settings. Edge AI is increasingly being used in agriculture, construction, energy, aerospace, satellites, the public sector, and more. With the NVIDIA Jetson edge AI and robotics platform, you can deploy AI and compute for sensor fusion in these complex environments.

At COMPUTEX 2023, NVIDIA announced the new Jetson AGX Orin Industrial module, which brings the next level of computing to harsh environments. This new module extends the capabilities of the previous-generation NVIDIA Jetson AGX Xavier Industrial and the commercial Jetson AGX Orin modules, by bringing server-class performance to ruggedized systems.

Embedded edge in ruggedized applications

Many applications—including those designed for agriculture, industrial manufacturing, mining, construction, and transportation—must withstand extreme environments and extended shocks and vibrations.

For example, robust hardware is vital for a wide range of agriculture applications, as it enables machinery to endure heavy workloads, navigate challenging bumpy terrains, and operate continuously under varying temperatures. NVIDIA Jetson modules have transformed smart farming, powering autonomous tractors and intelligent systems for harvesting, weeding, and selective spraying.

In railway applications, trains generate vibrations when traveling at high speeds. The interaction between a train’s wheels and the rails also leads to additional intermittent vibrations and shocks. Transportation companies are using Jetson for object detection, accident prevention, and optimizing maintenance costs.

Mining is another space where industrial requirements come into play. For example, Tage IDriver has launched a Jetson AGX Xavier Industrial-based vehicle, ground-cloud–coordinated, unmanned transportation solution for smart mines. The unmanned mining truck requires additional ruggedness for open-mine environments. The NVIDIA Jetson module has sensors such as LiDAR, cameras, and radar that enable the accurate perception needed in the harsh mining environment.

In near or outer space, where radiation levels pose significant challenges, the deployment of durable and radiation-tolerant modules is essential to ensure reliable and efficient operation. Many satellite companies are looking to deploy AI at the edge, but face challenges finding the right compute module. 

Together, the European Space Agency and Barcelona Supercomputing Center have studied the effects of radiation on the Jetson AGX Xavier Industrial modules. For more information, see the Sources of Single Event Effects in the NVIDIA Xavier SoC Family under Proton Irradiation whitepaper. Their radiation data showcases that the Jetson AGX Xavier Industrial module combined with a rugged enclosure is a good candidate for high-performance computation in thermally constrained satellites deployed in both low-earth and geosynchronous orbits.

The Jetson modules are transforming many of these space applications, and NVIDIA Jetson AGX Orin Industrial extends these capabilities.

Introducing NVIDIA Jetson AGX Orin Industrial

The Jetson AGX Orin Industrial module delivers up to 248 TOPS of AI performance with power configurable between 15-75 W. It’s form-factor and pin-compatible with Jetson AGX Orin and gives you more than 8X the performance of Jetson AGX Xavier Industrial.

This compact system-on-module (SOM) supports multiple concurrent AI application pipelines with an NVIDIA Ampere architecture GPU, next-generation deep learning and vision accelerators, high-speed I/O, and fast memory bandwidth. It comes with an extended temperature range, operating lifetime, and shock and vibration specifications, as well as support for error correction code (ECC) memory.

Industrial applications under extreme heat or extreme cold require extended temperature support along with underfill and corner bonding to protect the module under these harsh environments. Inline DRAM ECC is required in these applications for data integrity and system reliability. Industrial environments involve critical operations and sensitive data processing. ECC helps to ensure data integrity by detecting and correcting errors in real time. 

The following table lists the key new industrial features of the new Jetson AGX Orin Industrial SOM compared with Jetson AGX Orin 64GB module. For more information about the NVIDIA Jetson Orin architecture, see the Jetson AGX Orin Technical Brief and the Jetson Embedded Download Center.

  Jetson AGX Orin 64GB Jetson AGX Orin Industrial
AI Performance 275 TOPS 248 TOPS
Module 2048-core NVIDIA Ampere architecture GPU with 64 Tensor Cores 12-core Arm Cortex A78AE CPU 64-GB LPDDR5 64-GB eMMC 2048-core NVIDIA Ampere architecture GPU with 64 Tensor Cores 12-core Arm Cortex A78AE CPU 64-GB LPDDR5 with Inline ECC 64-GB eMMC
Operating Temperature -25° C – 80° C at TTP -40° C  – 85° C at TTP
Module Power 15–60 W 15–75 W
Operating Lifetime 5 years 10 years
87K hours @ 85° C
Shock Non-operational: 140G, 2 ms Non-operational: 140G, 2 ms
Operational: 50G, 11 ms
Vibration Non-operational: 3G Non-operational: 3G
Operational: 5G
Humidity Biased, 85°C, 85% RH, 168 hours :85°C / 85% RH, 1,000 hours, Power ON  
Temperature Endurance -20°C, 24 hours 45°C, 168 hours (operational) -40°C, 72 hours 85°C, 1000 hours (operational)
Mechanical 100 mm x 87 mm 100 mm x 87 mm
Underfill ~ SoC Corner Bonding & Component Underfill
Production Lifecycle 7 years (until 2030) 10 years (until 2033)
Table 1. Key features of the new Jetson AGX Orin Industrial SOM

Robust software and ecosystem support

Jetson AGX Orin Industrial is powered by the NVIDIA AI software stack, with tools and SDKs to accelerate each step of the development journey for time to market. It combines platforms like NVIDIA Isaac for robotics and NVIDIA Isaac Replicator for synthetic data generation powered by NVIDIA Omniverse; frameworks like NVIDIA Metropolis with DeepStream for intelligent video analytics and NVIDIA TAO Toolkit; and a large collection of pretrained and production-ready models. These all accelerate the process of model development and help you create fully hardware-accelerated AI applications.

SDKs like NVIDIA JetPack provide all the accelerated libraries in a powerful yet easy-to-use development environment to get started quickly with any Jetson module. NVIDIA JetPack also enables security features in the Jetson AGX Orin module to enable edge-to-cloud security and protect your deployments:

  • Hardware root of trust
  • Firmware TPM
  • Secure boot and measured boot
  • Hardware-accelerated cryptography
  • Trusted execution environment
  • Support for encrypted storage and memory
  • And more

The vibrant Jetson ecosystem partners in the NVIDIA Partner network are integrating the industrial module into hardware and software solutions for industrial applications:

  • Partner cameras enable computer vision tasks such as quality inspection.
  • Connectivity partners facilitate data transfer with interfaces such as Ethernet.
  • The sensor and connectivity partners with upcoming Jetson AGX Orin Industrial solutions include Silex, Infineon, Basler, e-con Systems, Leopard Imaging, FRAMOS, and D3.
  • Hardware partners design carrier boards for industrial edge computing.

This collaborative effort enables you to deploy AI-enabled solutions for industrial automation, robotics, and more.

The following partners will have carrier boards and full systems with the Jetson AGX Orin Industrial module: Advantech, AVerMedia, Connect Tech, Forecr, Leetop, Realtimes, and Syslogic. For more information, see Jetson Ecosystem.

Diagram shows how Jetson ecosystem partners and NVIDIA develop new solutions using hardware, software, cameras, connectivity, and other technologies.
Figure 1. Jetson ecosystem partners collaborate with NVIDIA

The new Jetson AGX Orin Industrial module will be available in July, and you can reach out to a distributor in your region to place an order now. Because Jetson AGX Orin Industrial and Jetson AGX Orin 64GB are pin– and software-compatible, you can start building solutions today with the Jetson AGX Orin Developer Kit and the latest JetPack.

For more information, see the NVIDIA Jetson AGX Orin Industrial documentation available at the Jetson download center, the NVIDIA Embedded Developer page, and the Jetson forums.