The pace of 5G investment and adoption is accelerating. According to the GSMA Mobile Economy 2023 report, nearly $1.4 trillion will be spent on 5G CAPEX,…
The pace of 5G investment and adoption is accelerating. According to the GSMA Mobile Economy 2023 report, nearly $1.4 trillion will be spent on 5G CAPEX, between 2023 and 2030. Radio access network (RAN) may account for over 60% of the spend.
Increasingly, the CAPEX spend is moving from the traditional approach with proprietary hardware, to virtualized RAN (vRAN) and Open RAN architectures that can benefit from cloud economics and do not require dedicated hardware. Despite these benefits, Open RAN adoption is struggling because existing technology has yet to deliver the benefits of cloud economics, and cannot deliver high performance and flexibility at the same time.
NVIDIA has overcome these challenges with the NVIDIA AX800 converged accelerator, delivering a truly cloud-native and high-performance accelerated 5G solution on commodity hardware that can run on any cloud (Figure 1).
To benefit from cloud economics, the future of the RAN is in the cloud (RAN-in-the-Cloud). The road to cloud economics aligns with Clayton Christensen’s characterization of disruptive innovation in traditional industries as presented in his book, The Innovator’s Dilemma: When New Technologies Cause Great Firms to Fail. That is, with progressive incremental improvements, new and seemingly inferior products are able to eventually capture market share.
Existing Open RAN solutions currently cannot support non-5G workloads and still deliver inferior 5G performance. They are mostly still using single-use hardware accelerators. This limits their appeal to telecom executives, as the comparative performance of traditional solutions delivers a tried-and-tested deployment plan for 5G.
However, the NVIDIA Accelerated 5G RAN solution based on NVIDIA AX800 has overcome these limitations and is now delivering comparable performance to traditional 5G solutions. This paves the way to deploy 5G Open RAN on commercial-off-the-shelf (COTS) hardware at any public cloud or telco edge.
Figure 1. NVIDIA AX800 converged accelerator and NVIDIA Aerial SDK deliver a compelling advantage for the Open RAN ecosystem
Solutions to support cloud-native RAN
To drive broad adoption of cloud-native RAN, the industry needs solutions that are cloud native, deliver high RAN performance, and are built with AI capability.
Cloud native
This approach delivers better utilization, multi-use and multi-tenancy, lower TCO, and increased automation—with all the virtues of cloud computing and benefiting from cloud economics.
A cloud-native network that benefits from cloud economics requires a complete rethink in approach to deliver a network that is 100% software-defined, deployed on general-purpose hardware and can support multi-tenancy. As such, it is not about building bespoke and dedicated systems in the public or telco cloud managed by Cloud Service Providers (CSPs).
High RAN performance
High RAN performance is required to deliver new technology—such as massive MIMO with its improved spectral efficiency, cell density, and higher throughput—all with improved energy efficiency. Achieving high performance on commodity hardware that is comparable to the performance of dedicated systems is proving a formidable challenge. This is due to the death of Moore’s Law and the relatively low performance achieved by software running on CPUs.
As a result, RAN vendors are building fixed-function accelerators to improve the CPU performance. This approach leads to inflexible solutions and does not meet the flexibility and openness expectations for Open RAN. In addition, with fixed-function or single-use accelerators, the benefits of cloud economics cannot be achieved.
For example, software-defined 5G networks that are based on Open RAN specifications and COTS hardware are achieving typical peak throughput of ~10 Gbps compared to >30 Gbps peak throughput performance on 5G networks that are built in the traditional, vertically integrated, appliance approach using bespoke software and hardware.
According to a recent survey of 52 telco executives reported in Telecom Networks: Tracking the Coming xRAN Revolution, “In terms of obstacles to xRAN deployment, 62% of operators voice concerns regarding xRAN performance today relative to traditional RAN.”
AI capability
Solutions must evolve from the current application, based on proprietary implementation in the telecom network, toward an AI-capable infrastructure for hosting internal and external applications. AI plays a role in 5G (AI-for-5G) to automate and improve system performance. Likewise, AI plays a role, together with 5G (AI-on-5G), to enable new features in 5G and beyond.
Achieving these goals requires a new architectural approach for cloud-native RAN, especially with a general-purpose COTS-based accelerated computing platform. This is the NVIDIA focus, as summarized in Figure 2.
The emphasis is on delivering a general-purpose COTS server built with NVIDIA converged accelerators (such as the NVIDIA AX800) that can support high-performance 5G and AI workloads on the same platform. This will deliver cloud economics with better utilization and reduced TCO, and a platform that can efficiently run AI workloads to future proof the RAN for the 6G era.
Figure 2. NVIDIA technological innovations transforming the RAN
Run 5G and AI workloads on the same accelerator with NVIDIA AX800
The NVIDIA AX800 converged accelerator is a game changer for CSPs and telcos because it brings cloud economics into the operations and management of telecom networks. The AX800 supports multi-use and multi-tenancy of both 5G and AI workloads on commodity hardware, that can run on any cloud, by dynamically scaling the workloads. In doing so, it enables CSPs and telcos to use the same infrastructure for both 5G and AI with high utilization levels.
Dynamic scaling for multi-tenancy
The NVIDIA AX800 achieves dynamic scaling, both at the data center and at the server and card levels, enabling support of 5G and AI workloads. This scalable, flexible, energy-efficient, and cost-effective approach can deliver a variety of applications and services.
At the data center and server levels, the NVIDIA AX800 supports dynamic scaling. The Open RAN service and management orchestration (SMO) is able to allocate and reallocate computational resources in real time to support either 5G or AI workloads.
At the card level, NVIDIA AX800 supports dynamic scaling using NVIDIA Multi-Instance GPU (MIG), as shown in Figure 3. MIG enables concurrent processing of virtualized 5G base stations and edge AI applications on pooled GPU hardware resources. This enables each function to run on the same server and accelerator in a coherent and energy-conscientious manner.
This novel approach provides increased radio capacity and processing power, contributing to better performance and enabling peak data throughput processing with room for future antenna technology advancements.
Figure 3. NVIDIA Multi-Instance GPU redistribution based on workload and resource requirements
Commercial implications of dynamic scaling for multi-tenancy
The rationale for pooling 5G RAN in the cloud (RAN-in-the-Cloud) is straightforward. The RAN constitutes the largest CAPEX and OPEX spending for telcos (>60%). Yet the RAN is also the most underutilized resource, with most radio base stations typically operating below 50%.
Moving RAN compute into the cloud brings all the benefits of cloud computing: pooling and higher utilization in a shared cloud infrastructure resulting in the largest CAPEX and OPEX reduction for telcos. It also supports cloud-native scale in scale out and dynamic resource management.
Dynamic scaling for multi-tenancy is commercially significant in three ways. First, it enables deployment of 5G and AI on general-purpose computing hardware, paving the way to run the 5G RAN on any cloud, whether on the public cloud or the telco edge cloud (telco base station). As all general computing workloads migrate to the cloud, it is clear that the future of the RAN will also be in the cloud. NVIDIA is a leading industry voice to realize this vision, as detailed in RAN-in-the-Cloud: Delivering Cloud Economics to 5G RAN.
Second, dynamic scaling leverages cloud economics to deliver ROI improvements to telecom networks. Instead of the typical TCO challenges with single-use solutions, multi-tenancy enables the same infrastructure to be used for multiple workloads, hence increasing utilization.
Telcos and enterprises are already using the cloud for mixed workloads, which are spike-sensitive, expensive, and consist of many one-off “islands.” Likewise, telcos and enterprises are increasingly using NVIDIA GPU servers to accelerate edge AI applications. The NVIDIA AX800 provides an easy path to use the same GPU resources for accelerating the 5G RAN connectivity, in addition to edge AI applications.
Third, the opportunity for dynamic scaling using NVIDIA AX800 provides marginal utility to telcos and CSPs who are already investing in NVIDIA systems and solutions to power their AI (especially generative AI) services.
Current demand for NVIDIA compute, especially to support generative AI applications, is significantly high. As such, once the investment is made, deriving additional marginal utility from running 5G and generative AI applications together massively accelerates the ROI on NVIDIA accelerated compute.
Figure 4. NVIDIA RAN-in-the-Cloud building blocks
NVIDIA AX800 delivers performance improvements for software-defined 5G
The NVIDIA AX800 converged accelerator delivers 36 Gbps throughput on a 2U server, when running NVIDIA Aerial 5G vRAN, substantially improving the performance for a software-defined, commercially available Open RAN 5G solution.
This is a significant performance improvement over the typical peak throughput of ~10 Gbps of existing Open RAN solutions. It compares favorably with the >30 Gbps peak throughput performance on 5G networks that are built in the traditional way. It achieves this today by accelerating the physical layer 1 (L1) stack in the NVIDIA Aerial 5G vRAN (Figure 5). Further performance breakthroughs are in the pipeline as the NVIDIA AX800 can be leveraged for the full 5G stack in the near future (Figure 6).
The NVIDIA AX800 converged accelerator combines NVIDIA Ampere architecture GPU technology with the NVIDIA BlueField-3 DPU. It has nearly 1 TB/s of GPU memory bandwidth and can be partitioned into as many as seven GPU instances. NVIDIA BlueField-3 supports 256 threads, making the NVIDIA AX800 capable of high performance on the most demanding I/O-intensive workloads, such as L1 5G vRAN.
NVIDIA AX800 with NVIDIA Aerial together deliver this performance for 10 peak 4T4R cells on TDD at 100 MHz using four downlink (DL) and two uplink (UL) layers and 100% physical resource block (PRB) utilization. This enables the system to achieve 36.56 Gbps and 4.794 Gbps DL and UL throughput, respectively.
The NVIDIA solution is also highly scalable and can support from 2T2R (sub 1 GHz macro deployments) to 64T64R (massive MIMO deployments) configurations. Massive MIMO workloads with high layer counts are dominated by the computational complexity of algorithms for estimating and responding to channel conditions (for example, sounding reference signal channel estimator, channel equalizer, beamforming, and more).
The GPU, and specifically the AX800 (with the highest streaming multiprocessor count for NVIDIA Ampere architecture GPUs), offers the ideal solution to tackle the complexity of massive MIMO workloads at moderate power envelopes.
Figure 5. The NVIDIA AX800 converged accelerator delivers performance breakthroughs by accelerating the physical layer (Layer 1) of the NVIDIA Aerial 5G vRAN stack
Figure 6. The NVIDIA AX800 converged accelerator with 5G vRAN full-stack acceleration will extend the performance improvements of the NVIDIA Aerial 5G vRAN stack
Summary
The NVIDIA AX800 converged accelerator offers a new architectural approach to deploying 5G on commodity hardware on any cloud. It delivers a throughput performance improvement of 36 Gbps for software-defined 5G using the NVIDIA Aerial 5G vRAN stack.
NVIDIA AX800 brings the vision of the RAN-in-the-Cloud closer to reality, offering telcos and CSPs a roadmap to move 5G RAN workloads into any cloud. There they can be dynamically combined with other AI workloads to improve infrastructure utilization, optimize TCO, and boost ROI. Likewise, the throughput improvement dramatically boosts the performance of Open RAN solutions, making them competitive with traditional 5G RAN options.
NVIDIA is working with CSPs, telcos, and OEMs to deploy the NVIDIA AX800 in commercial 5G networks. For more information, visit AI Solutions for Telecom.
Generative AI technologies are revolutionizing how games are conceived, produced, and played. Game developers are exploring how these technologies impact 2D and…
Generative AI technologies are revolutionizing how games are conceived, produced, and played. Game developers are exploring how these technologies impact 2D and 3D content-creation pipelines during production. Part of the excitement comes from the ability to create gaming experiences at runtime that would have been impossible using earlier solutions.
The creation of non-playable characters (NPCs) has evolved as games have become more sophisticated. The number of pre-recorded lines has grown, the number of options a player has to interact with NPCs has increased, and facial animations have become more realistic.
Yet player interactions with NPCs still tend to be transactional, scripted, and short-lived, as dialogue options exhaust quickly, serving only to push the story forward. Now, generative AI can make NPCs more intelligent by improving their conversational skills, creating persistent personalities that evolve over time, and enabling dynamic responses that are unique to the player.
At COMPUTEX 2023, NVIDIA announced the future of NPCs with the NVIDIA Avatar Cloud Engine (ACE) for Games. NVIDIA ACE for Games is a custom AI model foundry service that aims to transform games by bringing intelligence to NPCs through AI-powered natural language interactions.
Developers of middleware, tools, and games can use NVIDIA ACE for Games to build and deploy customized speech, conversation, and animation AI models in software and games.
Generate NPCs with the latest breakthroughs in AI foundation models
Figure 1. Use NVIDIA ACE for Games to customize and deploy LLMs through cloud or PC to generate intelligent NPCs
The optimized AI foundation models include the following:
NVIDIA NeMo: Provides foundation language models and model customization tools so you can further tune the models for game characters. The models can be integrated end-to-end or in any combination, depending on need. This customizable large language model (LLM) enables specific character backstories and personalities that fit the game world.
NVIDIA Riva: Provides automatic speech recognition (ASR) and text-to-speech (TTS) capabilities to enable live speech conversation with NVIDIA NeMo.
NVIDIA Omniverse Audio2Face: Instantly creates expressive facial animation for game characters from just an audio source. Audio2Face features Omniverse connectors for Unreal Engine 5, so you can add facial animation directly to MetaHuman characters.
You can bring life to NPCs through NeMo model alignment techniques. First, employ behavior cloning to enable the base language model to perform role-playing tasks according to instructions. To further align the NPC’s behavior with expectations, in the future, you can apply reinforcement learning from human feedback (RLHF) to receive real-time feedback from designers during the development process.
After the NPC is fully aligned, the final step is to apply NeMo Guardrails, which adds programmable rules for NPCs. This toolkit assists you in building accurate, appropriate, on-topic, and secure game characters. NeMo Guardrails natively supports LangChain, a toolkit for developing LLM-powered applications.
NVIDIA offers flexible deployment methods for middleware, tools, and game developers of all sizes. The neural networks enabling NVIDIA ACE for Games are optimized for different capabilities, with various size, performance, and quality trade-offs.
The ACE for Games foundry service will help you fine-tune models for your games and then deploy them through NVIDIA DGX Cloud, GeForce RTX PCs, or on-premises for real-time inferencing. You can also validate the quality of the models in real time and test performance and latency to ensure that they meet specific standards before deployment.
Create end-to-end avatar solutions for games
To showcase how you can leverage ACE for Games to build NPCs, NVIDIA partnered with Convai, a startup building a platform for creating and deploying AI characters in games and virtual worlds, to help optimize and integrate ACE modules into their offering.
“With NVIDIA ACE for Games, Convai’s tools can achieve the latency and quality needed to make AI non-playable characters available to nearly every developer in a cost-efficient way,” said Purnendu Mukherjee, founder and CEO at Convai.
Convai used NVIDIA Riva for speech-to-text and text-to-speech capabilities, NVIDIA NeMo for the LLM that drives the conversation, and Audio2Face for AI-powered facial animation from voice inputs.
Video 1. NVIDIA Kairos demo showcases Jin, an immersive NPC, and a ramen shop built with the latest NVIDIA RTX and NVIDIA DLSS technologies
As shown in Video 1, these modules were integrated seamlessly into the Convai services platform and fed into Unreal Engine 5 and MetaHuman to bring the immersive NPC Jin to life. The ramen shop scene, created by the NVIDIA Lightspeed Studios art team, runs in the NVIDIA RTX Branch of Unreal Engine 5 (NvRTX 5.1). The scene is rendered using RTX Direct Illumination (RTXDI) for ray-traced lighting and shadows alongside NVIDIA DLSS 3 for maximum performance.
Game developers are already using existing NVIDIA generative AI technologies for game development:
GSC Game World, one of Europe’s leading game developers, is adopting Audio2Face in its upcoming game, S.T.A.L.K.E.R. 2: Heart of Chornobyl.
Fallen Leaf, an indie game developer, is also using Audio2Face for character facial animation in Fort Solis, a third-person sci-fi thriller game that takes place on Mars.
Generative AI-focused companies such as Charisma.ai are leveraging Audio2Face to power the animation in their conversation engine.
At COMPUTEX 2023, NVIDIA announced NVIDIA DGX GH200, which marks another breakthrough in GPU-accelerated computing to power the most demanding giant AI…
At COMPUTEX 2023, NVIDIA announced NVIDIA DGX GH200, which marks another breakthrough in GPU-accelerated computing to power the most demanding giant AI workloads. In addition to describing critical aspects of the NVIDIA DGX GH200 architecture, this post discusses how NVIDIA Base Command enables rapid deployment, accelerates the onboarding of users, and simplifies system management.
The unified memory programming model of GPUs has been the cornerstone of various breakthroughs in complex accelerated computing applications over the last 7 years. In 2016, NVIDIA introduced NVLink technology and the Unified Memory Programming model with CUDA-6, designed to increase the memory available to GPU-accelerated workloads.
Since then, the core of every DGX system is a GPU complex on a baseboard interconnected with NVLink in which each GPU can access the other’s memory at NVLink speed. Many such DGX with GPU complexes are interconnected with high-speed networking to form larger supercomputers such as the NVIDIA Selene supercomputer. Yet an emerging class of giant, trillion-parameter AI models will require either several months to train or cannot be solved even on today’s best supercomputers.
To empower the scientists in need of an advanced platform that can solve these extraordinary challenges, NVIDIA paired NVIDIA Grace Hopper Superchip with the NVLink Switch System, uniting up to 256 GPUs in an NVIDIA DGX GH200 system. In the DGX GH200 system, 144 terabytes of memory will be accessible to the GPU shared memory programming model at high speed over NVLink.
Compared to a single NVIDIA DGX A100 320 GB system, NVIDIA DGX GH200 provides nearly 500x more memory to the GPU shared memory programming model over NVLink, forming a giant data center-sized GPU. NVIDIA DGX GH200 is the first supercomputer to break the 100-terabyte barrier for memory accessible to GPUs over NVLink.
Figure 1. GPU memory gains as a result of NVLink progression
NVIDIA DGX GH200 system architecture
NVIDIA Grace Hopper Superchip and NVLink Switch System are the building blocks of NVIDIA DGX GH200 architecture. NVIDIA Grace Hopper Superchip combines the Grace and Hopper architectures using NVIDIA NVLink-C2C to deliver a CPU + GPU coherent memory model. The NVLink Switch System, powered by the fourth generation of NVLink technology, extends NVLink connection across superchips to create a seamless, high-bandwidth, multi-GPU system.
Each NVIDIA Grace Hopper Superchip in NVIDIA DGX GH200 has 480 GB LPDDR5 CPU memory, at eighth of the power per GB, compared with DDR5 and 96 GB of fast HBM3. NVIDIA Grace CPU and Hopper GPU are interconnected with NVLink-C2C, providing 7x more bandwidth than PCIe Gen5 at one-fifth the power.
NVLink Switch System forms a two-level, non-blocking, fat-tree NVLink fabric to fully connect 256 Grace Hopper Superchips in a DGX GH200 system. Every GPU in DGX GH200 can access the memory of other GPUs and extended GPU memory of all NVIDIA Grace CPUs at 900 GBps.
Compute baseboards hosting Grace Hopper Superchips are connected to the NVLink Switch System using a custom cable harness for the first layer of NVLink fabric. LinkX cables extend the connectivity in the second layer of NVLink fabric.
Figure 2. Topology of a fully connected NVIDIA NVLink Switch System across NVIDIA DGX GH200 consisting of 256 GPUs
In the DGX GH200 system, GPU threads can address peer HBM3 and LPDDR5X memory from other Grace Hopper Superchips in the NVLink network using an NVLink page table. NVIDIA Magnum IO acceleration libraries optimize GPU communications for efficiency, enhancing application scaling with all 256 GPUs.
Every Grace Hopper Superchip in DGX GH200 is paired with one NVIDIA ConnectX-7 network adapter and one NVIDIA BlueField-3 NIC. The DGX GH200 has 128 TBps bi-section bandwidth and 230.4 TFLOPS of NVIDIA SHARP in-network computing to accelerate collective operations commonly used in AI and doubles the effective bandwidth of the NVLink Network System by reducing the communication overheads of collective operations.
For scaling beyond 256 GPUs, ConnectX-7 adapters can interconnect multiple DGX GH200 systems to scale into an even larger solution. The power of BlueField-3 DPUs transforms any enterprise computing environment into a secure and accelerated virtual private cloud, enabling organizations to run application workloads in secure, multi-tenant environments.
Target use cases and performance benefits
The generational leap in GPU memory significantly improves the performance of AI and HPC applications bottlenecked by GPU memory size. Many mainstream AI and HPC workloads can reside entirely in the aggregate GPU memory of a single NVIDIA DGX H100. For such workloads, the DGX H100 is the most performance-efficient training solution.
Other workloads—such as a deep learning recommendation model (DLRM) with terabytes of embedded tables, a terabyte-scale graph neural network training model, or large data analytics workloads—see speedups of 4x to 7x with DGX GH200. This shows that DGX GH200 is a better solution for the more advanced AI and HPC models requiring massive memory for GPU shared memory programming.
Figure 3. Performance comparisons for giant memory AI workloads
Purpose-designed for the most demanding workloads
Every component throughout DGX GH200 is selected to minimize bottlenecks while maximizing network performance for key workloads and fully utilizing all scale-up hardware capabilities. The result is linear scalability and high utilization of the massive, shared memory space.
To get the most out of this advanced system, NVIDIA also architected an extremely high-speed storage fabric to run at peak capacity and to handle a variety of data types (text, tabular data, audio, and video)—in parallel and with unwavering performance.
Full-stack NVIDIA solution
DGX GH200 comes with NVIDIA Base Command, which includes an OS optimized for AI workloads, cluster manager, libraries that accelerate compute, storage, and network infrastructure are optimized for DGX GH200 system architecture.
DGX GH200 also includes NVIDIA AI Enterprise, providing a suite of software and frameworks optimized to streamline AI development and deployment. This full-stack solution enables customers to focus on innovation and worry less about managing their IT infrastructure.
Figure 4. The NVIDIA DGX GH200 AI supercomputer full stack includes NVIDIA Base Command and NVIDIA AI Enterprise
Supercharge giant AI and HPC workloads
NVIDIA is working to make DGX GH200 available at the end of this year. NVIDIA is eager to provide this incredible first-of-its-kind supercomputer and empower you to innovate and pursue your passions in solving today’s biggest AI and HPC challenges. Learn more.
Embedded edge AI is transforming industrial environments by introducing intelligence and real-time processing to even the most challenging settings. Edge AI is…
Embedded edge AI is transforming industrial environments by introducing intelligence and real-time processing to even the most challenging settings. Edge AI is increasingly being used in agriculture, construction, energy, aerospace, satellites, the public sector, and more. With the NVIDIA Jetson edge AI and robotics platform, you can deploy AI and compute for sensor fusion in these complex environments.
At COMPUTEX 2023, NVIDIA announced the new Jetson AGX Orin Industrial module, which brings the next level of computing to harsh environments. This new module extends the capabilities of the previous-generation NVIDIA Jetson AGX Xavier Industrial and the commercial Jetson AGX Orin modules, by bringing server-class performance to ruggedized systems.
Embedded edge in ruggedized applications
Many applications—including those designed for agriculture, industrial manufacturing, mining, construction, and transportation—must withstand extreme environments and extended shocks and vibrations.
For example, robust hardware is vital for a wide range of agriculture applications, as it enables machinery to endure heavy workloads, navigate challenging bumpy terrains, and operate continuously under varying temperatures. NVIDIA Jetson modules have transformed smart farming, powering autonomous tractors and intelligent systems for harvesting, weeding, and selective spraying.
In railway applications, trains generate vibrations when traveling at high speeds. The interaction between a train’s wheels and the rails also leads to additional intermittent vibrations and shocks. Transportation companies are using Jetson for object detection, accident prevention, and optimizing maintenance costs.
Mining is another space where industrial requirements come into play. For example, Tage IDriver has launched a Jetson AGX Xavier Industrial-based vehicle, ground-cloud–coordinated, unmanned transportation solution for smart mines. The unmanned mining truck requires additional ruggedness for open-mine environments. The NVIDIA Jetson module has sensors such as LiDAR, cameras, and radar that enable the accurate perception needed in the harsh mining environment.
In near or outer space, where radiation levels pose significant challenges, the deployment of durable and radiation-tolerant modules is essential to ensure reliable and efficient operation. Many satellite companies are looking to deploy AI at the edge, but face challenges finding the right compute module.
Together, the European Space Agency and Barcelona Supercomputing Center have studied the effects of radiation on the Jetson AGX Xavier Industrial modules. For more information, see the Sources of Single Event Effects in the NVIDIA Xavier SoC Family under Proton Irradiation whitepaper. Their radiation data showcases that the Jetson AGX Xavier Industrial module combined with a rugged enclosure is a good candidate for high-performance computation in thermally constrained satellites deployed in both low-earth and geosynchronous orbits.
The Jetson modules are transforming many of these space applications, and NVIDIA Jetson AGX Orin Industrial extends these capabilities.
Introducing NVIDIA Jetson AGX Orin Industrial
The Jetson AGX Orin Industrial module delivers up to 248 TOPS of AI performance with power configurable between 15-75 W. It’s form-factor and pin-compatible with Jetson AGX Orin and gives you more than 8X the performance of Jetson AGX Xavier Industrial.
This compact system-on-module (SOM) supports multiple concurrent AI application pipelines with an NVIDIA Ampere architecture GPU, next-generation deep learning and vision accelerators, high-speed I/O, and fast memory bandwidth. It comes with an extended temperature range, operating lifetime, and shock and vibration specifications, as well as support for error correction code (ECC) memory.
Industrial applications under extreme heat or extreme cold require extended temperature support along with underfill and corner bonding to protect the module under these harsh environments. Inline DRAM ECC is required in these applications for data integrity and system reliability. Industrial environments involve critical operations and sensitive data processing. ECC helps to ensure data integrity by detecting and correcting errors in real time.
The following table lists the key new industrial features of the new Jetson AGX Orin Industrial SOM compared with Jetson AGX Orin 64GB module. For more information about the NVIDIA Jetson Orin architecture, see the Jetson AGX Orin Technical Brief and the Jetson Embedded Download Center.
Jetson AGX Orin 64GB
Jetson AGX Orin Industrial
AI Performance
275 TOPS
248 TOPS
Module
2048-core NVIDIA Ampere architecture GPU with 64 Tensor Cores 12-core Arm Cortex A78AE CPU 64-GB LPDDR5 64-GB eMMC
2048-core NVIDIA Ampere architecture GPU with 64 Tensor Cores 12-core Arm Cortex A78AE CPU 64-GB LPDDR5 with Inline ECC 64-GB eMMC
Operating Temperature
-25° C – 80° C at TTP
-40° C – 85° C at TTP
Module Power
15–60 W
15–75 W
Operating Lifetime
5 years
10 years 87K hours @ 85° C
Shock
Non-operational: 140G, 2 ms
Non-operational: 140G, 2 ms Operational: 50G, 11 ms
Vibration
Non-operational: 3G
Non-operational: 3G Operational: 5G
Humidity
Biased, 85°C, 85% RH, 168 hours
:85°C / 85% RH, 1,000 hours, Power ON
Temperature Endurance
-20°C, 24 hours 45°C, 168 hours (operational)
-40°C, 72 hours 85°C, 1000 hours (operational)
Mechanical
100 mm x 87 mm
100 mm x 87 mm
Underfill
~
SoC Corner Bonding & Component Underfill
Production Lifecycle
7 years (until 2030)
10 years (until 2033)
Table 1. Key features of the new Jetson AGX Orin Industrial SOM
Robust software and ecosystem support
Jetson AGX Orin Industrial is powered by the NVIDIA AI software stack, with tools and SDKs to accelerate each step of the development journey for time to market. It combines platforms like NVIDIA Isaac for robotics and NVIDIA Isaac Replicator for synthetic data generation powered by NVIDIA Omniverse; frameworks like NVIDIA Metropolis with DeepStream for intelligent video analytics and NVIDIA TAO Toolkit; and a large collection of pretrained and production-ready models. These all accelerate the process of model development and help you create fully hardware-accelerated AI applications.
SDKs like NVIDIA JetPack provide all the accelerated libraries in a powerful yet easy-to-use development environment to get started quickly with any Jetson module. NVIDIA JetPack also enables security features in the Jetson AGX Orin module to enable edge-to-cloud security and protect your deployments:
Hardware root of trust
Firmware TPM
Secure boot and measured boot
Hardware-accelerated cryptography
Trusted execution environment
Support for encrypted storage and memory
And more
The vibrant Jetson ecosystem partners in the NVIDIA Partner network are integrating the industrial module into hardware and software solutions for industrial applications:
Partner cameras enable computer vision tasks such as quality inspection.
Connectivity partners facilitate data transfer with interfaces such as Ethernet.
Figure 1. Jetson ecosystem partners collaborate with NVIDIA
The new Jetson AGX Orin Industrial module will be available in July, and you can reach out to a distributor in your region to place an order now. Because Jetson AGX Orin Industrial and Jetson AGX Orin 64GB are pin– and software-compatible, you can start building solutions today with the Jetson AGX Orin Developer Kit and the latest JetPack.
Large Language Models (LLMs) and AI applications such as ChatGPT and DALL-E have recently seen rapid growth. Thanks to GPUs, CPUs, DPUs, high-speed storage, and…
Large Language Models (LLMs) and AI applications such as ChatGPT and DALL-E have recently seen rapid growth. Thanks to GPUs, CPUs, DPUs, high-speed storage, and AI-optimized software innovations, AI is now widely accessible. You can even deploy AI in the cloud or on-premises.
Yet AI applications can be very taxing on the network, and this growth is burdening CPU and GPU servers, as well as the existing underlying network infrastructure that connects these systems together.
Traditional Ethernet, while sufficient for handling mainstream and enterprise applications such as web and video or audio streaming, is not optimized to support the new generation of AI workloads. Traditional Ethernet is ideal for loosely coupled applications, low-bandwidth flows, and high jitter. It might be sufficient for heterogeneous traffic (such as web, video, or audio streaming; file transfers; and gaming) but is not ideal when oversubscription occurs.
Designed from the ground up to meet the performance demands for AI applications, NVIDIA Spectrum-X networking platform is an end-to-end solution that is optimized for high-speed network performance, low latency, and scale.
NVIDIA Spectrum-X
NVIDIA Spectrum-X networking platform was developed to address traditional Ethernet network limitations. It is a network fabric designed to answer the needs of demanding AI applications, intended for tightly coupled processes.
This NVIDIA-certified and tested end-to-end solution combines the best-in-class, AI-optimized networking hardware and software to provide a predictable, consistent, and uncompromising level of performance required by AI workloads.
Figure 1. NVIDIA Spectrum-X networking platform combines the NVIDIA Spectrum-4 Ethernet switch with NVIDIA BlueField-3 DPU to provide optimal performance for AI workloads
NVIDIA Spectrum-X is a highly versatile technology that can be used with various AI applications. Specifically, it can significantly enhance the performance and efficiency of AI clusters in the following use cases:
NVIDIA Spectrum-4 Ethernet switch provides unprecedented application performance for AI clusters built on standards-based Ethernet. Realizing the full potential of NVIDIA Spectrum-4 requires an end-to-end, purpose-built network architecture. Only the NVIDIA Spectrum-X platform provides the hardware accelerators and offloads needed to power hyperscale AI.
NVIDIA Spectrum-4 Ethernet switches are built on the 51.2 Tbps Spectrum-4 ASIC, with 4x the bandwidth of the previous generation. It is the world’s first Ethernet AI switching platform. It was designed for AI workloads, combining specialized high-performance architecture with standard Ethernet connectivity.
NVIDIA Spectrum-4 offers:
RoCE extensions: RoCE with unique enhancements
RoCE Adaptive Routing
RoCE Performance Isolation
Simplified, Automated Adaptive Routing and RoCE Configurations
Synchronized Collectives
Other RoCE for HPC enhancements
Highest effective bandwidth on Ethernet at scale
Low latency with low jitter and short tail
Deterministic performance and performance isolation
Full stack and end-to-end optimization
NVIDIA Cumulus Linux or SONiC
Figure 2. NVIDIA Spectrum-4 combines specialized high-performance architecture with standard Ethernet connectivity
Key benefits of NVIDIA Spectrum-X with NVIDIA Spectrum-4 include the following:
Using RoCE extension for AI and adaptive routing (AR) to achieve maximum NVIDIA Collective Communication Library (NCCL) performance.
Leveraging performance isolation to ensure that in a multi-tenant and multi-job environment, one job does not impact the other.
Ensuring that if there is a network component failure, the fabric continues to deliver the highest performance
Synchronizing with BlueField-3 DPU to achieve optimal NCCL and AI performance
Maintaining consistent and steady performance under various AI workloads, vital for achieving SLAs.
End-to-end optimal network performance
To build an effective AI compute fabric requires optimizing every part of the AI network, from DPUs to switches to networking software. Achieving the highest effective bandwidth at load and at scale demands using techniques such as RoCE adaptive routing and advanced congestion control mechanisms. Incorporating capabilities that work synchronously on NVIDIA BlueField-3 DPUs and Spectrum-4 switches is crucial to achieve the highest performance and reliability from the AI fabric.
RoCE adaptive routing
AI workloads and applications are characterized by a small number of elephant flows responsible for the large data movement between GPUs, where the tail latency highly impacts the overall application performance. Catering to such traffic patterns with traditional network routing mechanisms can lead to inconsistent and underutilized GPU performance for AI workloads.
RoCE adaptive routing is a fine-grained load balancing technology. It dynamically reroutes RDMA data to avoid congestion and provide optimal load balancing to achieve the highest effective data bandwidth.
It is an end-to-end capability that includes Spectrum-4 switches and BlueField-3 DPUs. The Spectrum-4 switches are responsible for selecting the least-congested port for data transmission on a per-packet basis. As different packets of the same flow travel through different paths of the network, they may arrive out of order to their destination. The BlueField-3 transforms any out-of-order data at the RoCE transport layer, transparently delivering in-order data to the application.
Spectrum-4 evaluates congestion based on egress queue loads, ensuring all ports are well-balanced. For every network packet, the switch selects the port with the minimal load over its egress queue. Spectrum-4 also receives status notifications from neighboring switches, which influence the routing decision. The queues evaluated are matched with the quality-of-service level.
As a result, NVIDIA Spectrum-X enables up to 95% effective bandwidth across the hyperscale system at load, and at scale.
Figure 3. NVIDIA Spectrum-4 typical data center deployment structure
RoCE congestion control
Applications running concurrently on hyperscale cloud systems may suffer from degraded performance and reproducible run-times due to network level congestion. This can be caused by the network traffic of the application itself, or background network traffic from other applications. The primary reason for this congestion is known as many-to-one congestion, where there are multiple data senders and a single data receiver.
Such congestion cannot be solved using adaptive routing and actually requires data-flow metering per endpoint. Congestion control is an end-to-end technology, where Spectrum-4 switches provide network telemetry information representing real time congestion data. This telemetry information is processed by the BlueField DPUs, which manage and control the data sender’s data injection rate, resulting in maximum efficiency of network sharing.
Without congestion control, many-to-one scenarios will cause network back-pressure and congestion spreading or even packet-drop, which dramatically degrade network and application performance.
In the congestion control process, BlueField-3 DPUs execute the congestion control algorithm. They handle millions of congestion control events per second in microsecond reaction latency and apply fine-grained rate decisions.
The Spectrum-4 switch in-band telemetry holds both queuing information for accurate congestion estimation, as well as port utilization indication for fast recovery. NVIDIA RoCE congestion control significantly improves congestion discovery and reaction time by enabling the telemetry data to bypass the congested flow queueing delay while still providing accurate and concurrent telemetry.
RoCE performance isolation
AI hyperscale and cloud infrastructures need to support a growing number of users (tenants) and parallel applications or workflows. These users and applications inadvertently compete on the infrastructure’s shared resources, such as the network, and therefore may impact performance.
The NVIDIA Spectrum-Xplatform includes mechanisms that, when combined, deliver performance isolation. It ensures that one workload cannot impact the performance of another. These mechanisms ensure that any workload cannot create network congestion that will impact data movement of another workload. The performance isolation mechanisms include quality of service isolation, RoCE adaptive routing for data path spreading, and RoCE congestion control.
The NVIDIA Spectrum-X platform features tight integration of software and hardware, enabling deeper understanding of AI workloads and traffic patterns. Such an infrastructure provides the capabilities to test with large workloads using a dedicated Ethernet AI cluster. By leveraging telemetry from Spectrum Ethernet switches and BlueField-3 DPUs, NVIDIA NetQ can detect network issues proactively and troubleshoot network issues faster for optimal use of network capacity.
The NVIDIA NetQ network validation and ASIC monitoring tool set provide visibility into the network health and behavior. The NetQ flow telemetry analysis shows the paths that data flows take as they traverse the network, providing network latency and performance insights.
Increased energy efficiency
Power capping has become a common practice in data centers due to the growing demand for computing resources and the need to control energy costs. The Spectrum-4 ASIC and optical innovations enable simplified network designs that improve performance per watt, achieving better efficiency and delivering faster AI insights, without exceeding network power budgets.
Summary
NVIDIA Spectrum-X networking platform is designed especially for demanding AI applications. With higher performance compared to traditional Ethernet, lower power consumption, lower TCO, full stack software-hardware integration, and massive scale, NVIDIA Spectrum-X is the ideal platform for running existing and future AI workloads.
Learn more
Looking for more information? Check out these resources:
The need for a high-fidelity multi-robot simulation environment is growing rapidly as more and more autonomous robots are being deployed in real-world…
The need for a high-fidelity multi-robot simulation environment is growing rapidly as more and more autonomous robots are being deployed in real-world scenarios. In this post, I will review what we used in the past at Cogniteam for simulating multiple robots, our current progress with NVIDIA Isaac Sim, and how Nimbus can speed up the development and maintenance of a multi-robot simulation with Isaac Sim.
Multi-robot simulation with Unreal Tournament game engine
About 20 years ago, my friends at Cogniteam and I started our robotic development careers with the idea of a robotic framework for multi-robot task allocation and teamwork. Originally called CogniTAO, a simplified version of this system was later published as ROS decision_making.
At the time, use cases for multiple robots were scarce, and 3D simulation for those robots was not possible. So I wrote a mod for the Unreal Tournament 2000-2004 game engine to enable simulation for four robots. It took our small team of four programmers about 3 years to develop a simulated environment that could reliably run for 15 minutes.
Figure 1. Simulation of four robots (left) and video from the robots (right)
This environment was able to simulate four robots with a camera, Hokuyo LiDAR, odometry, and mapping on five state-of-the-art desktops, and remotely receive video feeds from each. One of our engineers wrote a C++ TCP client that would stream the data on the local network directly from the game engine and display it in fullscreen. We had to run the code in strict order to make the robots spawn on time and in the correct place.
Multi-robot simulation with Gazebo
Fast forward 10 years to 2013 when we transitioned our work to Gazebo after it became the de facto platform for robotic simulation. It took three programmers about 2 years to simulate 10 robots on two Intel Xeon machines. They used the ROS move_base navigation stack and object detection using OpenCV Hough Circle Transform—what robotics teams used for demos before TensorFlow. Igor Makhtes, our colleague at the time, built the RQT plugin to control and show data streams from multiple robots (Figure 2). It took him 6 months to complete.
Figure 2. Video feed and map view for 10 robots with RQT plugin
These robots had to communicate with each other, but also needed to operate when a connection was unavailable. To make this possible, each had to run its own ROS master and sync through a ROS multimaster network.
Multi-robot simulation with NVIDIA Isaac Sim
A few months ago, I asked Saar Moseri, a computer science student on our algorithmic team at Cogniteam to set up a multi-robot simulative scenario using the cloud robotics ecosystem Nimbus and NVIDIA Isaac Sim. Our internal test team and I hoped to use the Nimbus agent to control our robots and view the data they generate.
It took Saar about 2 weeks to familiarize himself with the environment and configure the system. Figure 3 shows the result of this effort, running on a standard (single) desktop machine with an NVIDIA GeForce RTX 3080 in the Cogniteam lab.
Figure 3. NVIDIA Isaac Sim multi-robot default setup
Saar used the Isaac Sim documentation available through NVIDIA NGC to install and set up the environment. Using Nimbus, he installed an agent on the simulation machine and created a gateway node to receive data from the simulation through ROS.
We then created the node configuration shown in Figure 5.
Figure 5. Nimbus simple mission configuration with move_base navigation
The two building blocks (already containerized) are a gateway node and a node for move_base navigation. This configuration was deployed to the agent running on the simulation desktop in the Cogniteam lab. Other more complex configurations are also available (with sources) in the Nimbus hub, including nodes for GMapping, path following, and more.
My team and I were stunned by the endless possibilities this approach enables. In the configuration described above, simulated sensory data arrives from Isaac Sim through the ROS gateway, which supports both ROS and ROS 2. View and control capabilities are enabled by Nimbus.
Out of the box, this setup enables our team to carry out basic simulation tasks and simulate the control of a robot fleet locally in our lab, along with many more capabilities. We can now record simulated runs and sensory data from robots, remotely SSH into a simulation machine, monitor simulation data globally, and even send email and SMS notifications about simulation progress to our validation team—all from a web browser.
Combining Isaac Sim with Nimbus results in a unified system that is similar in features to available cloud simulation offerings, but runs on a local machine and does not involve additional cloud simulation compute costs. Additionally, it opens new cutting-edge simulation flows, such as simulation with hardware in the loop. This is not possible when the simulation runs in the cloud. Figure 6 shows how the control, navigation, and mapping look in Nimbus.
Figure 6. Nimbus robot WebRTC video monitoring (left) and Nimbus map view and autonomy control (right)
To replicate the setup described, reference the Isaac Sim documentation. Then visit Nimbus to create a free account, log in, and follow the instructions to create a robot using a free license.
After the robot agent is installed on the same desktop that Isaac Sim is running headless, you will be able to provision the simulation through remote SSH and monitor the simulation machine from the Nimbus website.
Video 1. Nimbus and NVIDIA Isaac Sim demo video
Visit the Nimbus hub to deploy the Isaac Sim configuration. Since everything is already containerized (including Isaac Sim) and control is browser-based, you do not need to install any applications. The agent on the machine will set up everything needed to execute.
Then, on the monitor page of that agent, add monitoring for any data that is relevant to your setup. In the agent settings, you can define notifications by adding conditions on ROS streams such as:
“if GoalStatus == ABORTED”
send sms/mail to simulation@your-company.com
Cogniteam is happy to help you in the process. You can reach us at support@cogniteam.com.
Summary
For the successful deployment of autonomous robots, simulation is key. Running the same scenario multiple times is crucial for testing, but multi-robot simulations differ. Developing a high-fidelity multi-robot simulated environment is complex and takes time, but it can be simplified with NVIDIA Isaac Sim and Nimbus, as described in this post.
My team and I will be attending ICRA 2023 in London, May 29 to June 2 (Booth C22), showcasing our browser interface to robots and simulations running remotely in Israel.
To learn more about Isaac Sim, check out the NVIDIA Developer Isaac ROS Forum.
The telecom sector is transforming how communication happens. Striving to provide reliable, uninterrupted service, businesses are tackling the challenge of…
The telecom sector is transforming how communication happens. Striving to provide reliable, uninterrupted service, businesses are tackling the challenge of delivering an optimal customer experience.
This optimal customer experience is something many long-time customers of large telecom service providers do not have. Take Jack, for example. His call was on hold for 10 minutes, which made him late for work. Jill, the third agent he spoke with, read the brief note provided by the previous agent but had trouble understanding it. So, she asked Jack a few questions to clarify. With no co-workers available, Jill consulted multiple policy documents to address Jack’s concerns. Several resources later, Jill located the necessary information, but sadly, Jack had already ended the call.
Long wait times, complex service requests, and a lack of personalization are some of the common issues faced by customers, leading to dissatisfaction and churn. To overcome these challenges, the telecom sector is turning to AI—specifically conversational AI, a technology that leverages speech, translation, and natural language processing (NLP) to facilitate human-like interactions.
This post explores why conversational AI systems are essential and why it is important to have a high level of transcription accuracy for optimal performance in downstream tasks. We explain the NVIDIA Riva speech recognition customization techniques Quantiphi has used to improve transcription accuracy.
In telco contact centers, highly accurate conversational AI systems are essential for several reasons. Conversational AI systems can help agents extract valuable information from call interactions and make informed decisions, leading to improved service quality and customer experience.
One key component in a conversational AI system is automatic speech recognition (ASR), also known as speech recognition or speech-to-text. Downstream tasks in telco contact centers heavily rely on accurate transcription provided by ASR systems. These tasks encompass a wide range of applications such as:
Customer insights
Sentiment analysis
Call classification
Call transcription
Quick and accurate responses are vital for efficient and effective customer service. That means reducing the overall latency of individual components, including ASR, is very important. By reducing the time required to complete a task, contact center agents can provide prompt solutions, leading to enhanced customer satisfaction and loyalty.
Moreover, accurate transcription that includes punctuation enhances readability. Clear and well-punctuated transcriptions help agents better understand customer queries, facilitating clear communication and problem solving. This, in turn, improves the overall efficiency and effectiveness of customer interactions.
NVIDIA Riva automatic speech recognition pipeline
Speech-to-text receives an audio stream as input, transcribes it, and produces the transcribed text as output (Figure 1). First, the audio stream goes to an audio feature extractor and preprocessor, which filter out noise and capture audio spectral features in a spectrogram or mel spectrogram. Then, an acoustic model, together with a language model, transcribes the speech into text. Punctuation is added to the transcribed text to improve readability.
Figure 1. Diagram of the end-to-end automatic speech recognition pipeline
Accuracy is fundamental, as it directly affects the quality and reliability of the transcriptions. By measuring accuracy through metrics like word error rate (WER), the system can be evaluated in terms of how well it transcribes spoken words. A low WER is vital in contact centers, as it ensures that customer queries and interactions are precisely captured, enabling agents to provide accurate and appropriate responses.
Latency is the time taken to generate a transcript of a segment of audio. To maintain an engaging experience, the caption should be delivered at a latency of no more than a few hundred milliseconds. A transcription system must deliver captions with minimal delay. Low latency ensures a seamless and engaging customer experience, enhancing overall efficiency and customer satisfaction.
Cost to develop and run a transcription service on sufficient compute infrastructure is another important measure. Although AI-based transcription is inexpensive compared to human interpreters, cost must be weighed along with other factors.
In a contact center setting, a transcription system must excel in accuracy to provide reliable transcriptions, offer low latency for prompt customer interactions, and consider cost factors to ensure a cost-effective and feasible solution for the organization. By optimizing all three metrics, the transcription system can effectively support contact center operations and enhance delivery of customer service.
Methods to improve ASR accuracy
As shown in Figure 2, there are several techniques that can be used to achieve the best possible transcription accuracy for a specific domain, the easiest of which is word boosting. ASR word boosting involves passing to the model a list of important, possibly out-of-vocabulary, domain-specific words as additional input. This enables the ASR module to recognize such words during inference.
Figure 2. Customization across the ASR pipeline
In most cases, certain nouns (such as the names of companies or services) are either not in the vocabulary, or are frequently mistranscribed by the ASR model. These nouns were added to the list of words to be boosted. This strategy enabled us to easily improve recognition of specific words at request time.
In addition, the Quantiphi team:
Retrained the language model on our own custom dataset to adapt the ASR engine to our domain-specific terms and phrases.
Customized speech-assisted conversational AI systems
One of the most significant challenges faced by customer contact centers in the telecom industry is the long time it takes to resolve complex queries. Agents typically need to consult with multiple stakeholders and internal policy documents to respond to complex queries.
Conversational AI systems provide relevant documentation, insights, and recommendations, thereby enabling contact center agents to expedite the resolution of customer queries.
The Quantiphi solution architecture for customized speech-assisted conversational AI pipeline involves the following:
Speech recognition pipeline: Creates transcriptions by capturing spoken language and converting it into text
Intent slot model: Identifies user intent
Semantic search pipeline: Retrieves answers for the agent query through the dialog manager
Quantiphi built a semantic search engine and a question-answering solution (Figure 3). It retrieves the most relevant documents for a given query and generates a concise answer for telco contact center agents.
Figure 3. Quantiphi question-answering solution with semantic search engine
ASR, in conjunction with question-answering (QnA) systems, is also used in virtual agents and avatar-based chatbots. The accuracy of ASR transcripts has a significant impact on the accuracy of agent assist, virtual agents, and avatar-based chatbots, since they are input to responses generated by a retrieval augmented generation (RAG) pipeline. Even a slight discrepancy in the way the query is transcribed can cause the generative model to provide incorrect responses.
The Quantiphi team tried off-the-shelf ASR models, which sometimes failed to correctly transcribe proper nouns. The quality of the ASR transcription is of paramount importance when it is used in conjunction with question – answering pipelines, as shown in the following example:
Query: What is 5G?
ASR transcript: What is five g.
Generator response: Five grand is the amount of money you can earn if you work in a factory for a month.
Correct response: 5G is the next generation of wireless technology. It will be faster, more reliable, and more secure than 4G LTE.
Words (or acronyms) such as mMTC and MEC were often transcribed incorrectly. We have addressed this with the help of word boosting. Consider the following example:
Before word boosting
Multi axis edge computing, also known as MEG is a type of network architecture that provides cloud computing capabilities and an It service environment at the edge of the network.
MtcFis a service area that offers low bandwidth connectivity with deep coverage.
After word boosting
Multi access edge computing also known as MEC is a type of network architecture that provides cloud computing capabilities and an IT service environment at the edge of the network.
mMTC is a service area that offers low bandwidth connectivity with deep coverage.
The before and after show how responses change, even if there is a slight difference in the way an n-gram is represented. Through inverse text normalization, the ASR model transcribes words such as ‘five g’ as ‘5G’, thus improving the QnA pipeline’s performance in the process.
Adding customized vocabulary to ASR
Most use cases typically have certain domain-specific words and jargon associated with them. To include these words in the ASR output, we added them to the vocabulary file and rebuilt the ASR model. For more details, see the tutorial How to Customize Riva ASR Vocabulary and Pronunciation with Lexicon Mapping.
Training n-gram language models
The contexts present in QnA tasks typically form a good source of text corpus to train an n-gram language model. A customized language model results in ASR outputs that are more receptive to sequences of words that commonly appear in the domain. We used an NVIDIA NeMo script to train a KenLM model and integrated it with the ASR model at build time.
Fine-tuning acoustic models
To further improve ASR performance, we fine-tuned an ASR acoustic model with 10-100 hours of small chunks (5-15 seconds) of audio data, with their corresponding ground-truth text. This helped the acoustic model to pick up regional accents. We used the Riva Jupyter notebook and NeMo for this fine-tuning. We further converted this checkpoint to Riva format using the nemo2riva tool and built it using the riva-build command.
Key takeaways
Question-answering and insights extraction make up conversational solutions that empower telecom customer service agents to provide personalized and efficient support. This improves customer satisfaction and reduces agent churn. To achieve highly accurate QnA and insights extraction solutions, it is necessary to provide high-accuracy transcriptions as an input to the rest of the pipeline.
Quantiphi achieved the highest possible accuracy by customizing speech recognition models with NVIDIA Riva ASR word boosting, inverse text normalization, custom vocabulary, training language models and fine-tuning acoustic models. This was not possible with off-the-shelf solutions.
What does that mean for Jack and Jill? Equipped with telco-customized speech-assisted conversational AI applications, Jill can quickly scan through the AI-generated summary of Jack’s previous conversations. Just as Jack finishes asking a question, her screen is already populated with the most relevant document to resolve Jack’s query. She swiftly conveys the information to Jack. He decides to answer the survey with positive feedback and still arrives at work on time.
According to Gartner,® “Nearly half of digital workers struggle to find the data they need to do their jobs, and close to one-third have made a wrong business…
According to Gartner,® “Nearly half of digital workers struggle to find the data they need to do their jobs, and close to one-third have made a wrong business decision due to lack of information awareness.”1 To address this challenge, more and more enterprises are deploying AI in customer service, as it helps to provide more efficient and information-based personalized services.
Technologies such as speech-to-text, text-to-speech, translation, deep learning, transformer models, and generative AI have changed how businesses interact with customers. These technologies enable:
Real-time analysis of customer feedback
Automation of customer interactions
Accurate and personalized AI-based recommendations to assist human agents handle customer inquiries
AI algorithms can process and analyze vast amounts of data, identify customer needs and behavior patterns, and empower the creation of engaging and satisfying customer experiences. Overall, the use of AI in customer service has significantly improved the quality and efficiency of customer interactions, benefiting both businesses and customers.
In the global economy, businesses operate across countries and serve customers with diverse linguistic and cultural backgrounds. This global language diversity presents a unique challenge for contact centers.
Effective communication is critical to providing excellent customer service, and language barriers can lead to miscommunication, misunderstandings, and frustration. This can result in dissatisfied customers and missed business opportunities.
Traditional approaches to multilingual support, such as hiring native speakers, training agents in different languages, and providing language-specific scripts are not scalable, cost effective, or efficient.
However, advances in speech AI and translation AI technology are helping contact centers overcome language barriers through language neutralization. This innovation has been crucial for contact centers catering to diverse customers.
What is language neutralization?
In the context of contact centers, language neutralization refers to the process of usingtranscription, translation, and speech synthesis (TTS) technologies to convert communication from a customer’s natural language to a language that an agent can understand. The agent then responds in their own language, which is again converted through transcription, translation, and speech synthesis, or a combination based on the scenario.
Language neutralization enables effective communication between parties who may not speak the same language, removing language barriers and facilitating smooth interaction. This technique involves advanced AI technologies to equip contact center agents with tools to help them understand customer queries and respond effectively.
Overcoming language barriers
Language neutralization is particularly important for contact centers that provide support services to customers from diverse linguistic and cultural backgrounds. Using language neutralization techniques, contact centers can effectively communicate with non-native speakers and provide them with the same level of service as native speakers.
Infosys Cortex language neutralization powered by NVIDIA Riva
Infosys Cortex, an AI-driven customer engagement platform, transforms contact center operations through purposeful communication and smart decision-making capabilities. With greater brain power and continuous coaching, Infosys Cortex helps employees make better and faster decisions on their journey from new hire to experienced agent.
Infosys Cortex leverages NVIDIA Riva, a cutting-edge speech and translation AI SDK, to power language neutralization capabilities. The world-class accuracy of Riva automatic speech recognition (ASR), neural machine translation (NMT), and engaging speech synthesis empower accurate and natural communication. Based on NVIDIA GPUs for model fine-tuning and processing, Riva enables a high-performance solution for contact centers.
Cortex platform features
The microservices-based architecture of Infosys Cortex includes five key modules that offer the following features (Figure 1):
Cortex Core: Sense, analyze, and generate actionable insights from data, and build new customer contexts along the way.
Learn: Enable agent training with simulated learning features; based on historical call pipelines, training bank creations, learn-and-practice models, and follow-up actions.
Empower: Provide proactive assistance to customers and agents using intelligent nudges based on transaction details, compliance, and real-time sentiment analysis to suggest the next best action.
Experience: Integrate with CTI/IVR to create contact flows for self-service, virtual assistance, and intelligent routing to enhance the customer experience.
Optimize: Generate insights through analyzing customer sentiment and interaction as well as agent behavior and performance.
Figure 1. Infosys Cortex, an AI-driven customer engagement platform, provides cloud-based open architecture, omnichannel integration, automation, and data-driven intelligence
Benefits and advantages
Riva services have been instrumental in addressing the key challenges Infosys has faced in relying on contact centers for customer service (Figure 2). The following are some key areas Riva addresses:
Accuracy: Domain-specific language and product name customizations and fine-tuning different accents and pronunciation enable future-proof solutions.
Language barrier: Support for 12 languages—Arabic, Chinese, English (US/UK), French, German, Italian, Japanese, Korean, Portuguese, Russian, and Spanish (LATAM/Spain)—with consistent addition of support for new languages.
Data privacy: On-premises deployment enables mitigation of data privacy issues, helping to ensure that sensitive data is kept secure.
Cost reduction: High-performance, efficient Riva models, along with flexible licensing, enable creation of cost-effective solutions as volumes increase.
Control: Better means of improving Riva models with phonetic boosting and transfer learning for specific domains.
Figure 2. Seamless language neutralization powered by NVIDIA Riva speech services transforms incoming audio into transcribed, translated, and agent-ready information
Overall, the advantages Riva models offer over managed services on the cloud include data privacy, predictable pricing, and better performance. In addition, the fine-tuning capabilities of Riva models enable further improvement of the model performance.
Language neutralization requires real-time integration with the CTI audio streams, and latency negatively impacts the experience. Riva on-premises models’ low latency is crucial, as every response must deal with transcription, translation, and synthesis flows at least once.
Key takeaways
Language neutralization is a transformative approach for contact centers, providing a scalable, cost-effective, and efficient solution for multilingual support.
The powerful language neutralization offered by Infosys Cortex and based on NVIDIA Riva speech and translation enables contact center agents to communicate effectively with customers and prevent misunderstandings and ambiguities.
Smoother customer-agent interaction leads to faster handling of issues and a reduction in wait time and backlog. Overall, the reduction in communication-based barriers results in contact centers reducing costs and increasing consistency, thus leading to greater customer satisfaction.
Developers can try Riva containers and pretrained models with a 90-day free trial through NGC. For production deployments, get unlimited usage on all clouds, enterprise-grade support, security, and API stability with the purchase of Riva, a premium edition of the NVIDIA AI Enterprise platform. Learn more.
1Gartner, Quick Answer: How Should Organizations Prepare for the Addition of Generative AI to the Microsoft Stack?, G00790185, 3/16/2023. GARTNER is a registered trademark and service mark of Gartner, Inc. and/or its affiliates in the U.S. and internationally and is used herein with permission. All rights reserved.