Categories
Misc

Fast Track Data Center Workloads and AI Applications with NVIDIA DOCA 2.2

NVIDIA DOCA SDK and acceleration framework empowers developers with extensive libraries, drivers, and APIs to create high-performance applications and services…

NVIDIA DOCA SDK and acceleration framework empowers developers with extensive libraries, drivers, and APIs to create high-performance applications and services for NVIDIA BlueField DPUs and ConnectX SmartNICs. It fuels data center innovation, enabling rapid application deployment.

With comprehensive features, NVIDIA DOCA serves as a one-stop-shop for BlueField developers looking to accelerate data center workloads and AI applications at scale.

With over 10,000 developers already benefiting, NVIDIA DOCA is now generally available, granting access to a broader developer community to leverage the BlueField DPU platform for innovative AI and cloud services.

New NVIDIA DOCA 2.2 features and enhancements

NVIDIA DOCA 2.2 introduces new features and enhancements for offloading, accelerating, and isolating network, storage, security, and management infrastructure within the data center.

Video 1. Watch an introduction to NVIDIA DOCA software framework

Programmability

The NVIDIA BlueField-3 DPU—in conjunction with its onboard, purpose-built data path accelerator (DPA) and the DOCA SDK framework—offers an unparalleled platform. It is now available for developers to create high-performance and scalable network applications that demand high throughput and low latency.

Data path accelerator

NVIDIA DOCA 2.2 delivers several enhancements to leverage the BlueField-3 DPA programming subsystem. DOCA DPA, a new compute subsystem part of the DOCA SDK package, offers a programming model for offloading communication-centric user code to run on the DPA processor. DOCA DPA helps to offload the CPU traffic and increase the performance through DPU acceleration.

The internal infrastructure of a BlueField-3 DPU and highlights the incoming and outgoing traffic moving between the GPU and CPU through the datapath accelerator and accelerated programmable pipeline
Figure 1. NVIDIA BlueField-3 DPU incoming and outgoing traffic

DOCA DPA also offers significant development benefits, including greater flexibility when creating custom emulations and congestion controls. Customized congestion control is critical for AI workflows, enabling performance isolation, improving fairness, and preventing packet drop on lossy networks. 

The DOCA 2.2 release introduces the following SDKs: 

DOCA-FlexIO: A low-level SDK for DPA programming. Specifically, the DOCA FlexIO driver exposes the API for managing and running code over the DPA. 

DOCA-PCC: An SDK for congestion-control development that enables CSP and enterprise customers to create their own congestion control algorithms to increase stability and efficient network operations through higher bandwidth and lower latency.

NVIDIA also supplies the necessary toolchains, examples, and collateral to expedite and support development efforts. Note that NVIDIA DOCA DPA is available in both DPU mode and NIC mode. 

Two individual line graphs highlighting the 3x improvements to bandwidth and 2x improvements to latency when running an AI workload as a result of enabling DOCA-PCC
Figure 2. DOCA-PCC offers higher bandwidth and lower latency

Networking

NVIDIA DOCA and the BlueField-3 DPU together enable the development of applications that deliver breakthrough networking performance with a comprehensive, open development platform. Including a range of drivers, libraries, tools, and example applications, NVIDIA DOCA continues to evolve. This release offers the following additional features to support the development of networking applications.

NVIDIA DOCA Flow

With NVIDIA DOCA Flow, you can define and control the flow of network traffic, implement network policies, and manage network resources programmatically. It offers network virtualization, telemetry, load balancing, security enforcement, and traffic monitoring. These capabilities are beneficial for processing high packet workloads with low latency, conserving CPU resources and reducing power usage.

This release includes the following new features that offer immediate benefits to cloud deployments: 

Support for tunnel offloads – GENEVE and GRE: Offering enhanced security, visibility, scalability, flexibility, and extensibility, are the building blocks for site communication, network isolation, and multi-tenancy. Specifically, GRE tunnels used to connect separate networks and establish secure VPN communication support overlay networks, offer protocol flexibility, and enable traffic engineering.

Support per-flow meter with bps/pps option: Essential in cloud environments to monitor/analyze traffic (measure bandwidth or packet rate), manage QoS (enforce limits), or enhance security (block denial-of-service attacks).

Enhanced mirror capability (FDB/switch domain): Used for monitoring, troubleshooting, security analysis, and performance optimization, this added functionality also provides better CPU utilization for mirrored workloads.

OVS-DOCA (Beta) 

OVS-DOCA is a highly optimized virtual switch for NVIDIA Network Services. An extremely efficient design promotes next-generation performance and scale through an NVIDIA NIC or DPU. OVS-DOCA is now available in DOCA for DPU and DOCA for Host (binaries and source).

A block diagram highlighting OVS-DOCA and its position relative to OVS-DPDK, OVS-Kernel, OVS, CLI, and OpenFlow
Figure 3. OVS-DOCA optimized for NVIDIA network services

Based on Open vSwitch, OVS-DOCA offers the same northbound API, OpenFlow, CLI, and data interface, providing a drop-in replacement alternative to OVS. Using OVS-DOCA enables faster implementation of future NVIDIA innovative networking features.

BlueField-3 (enhanced) NIC mode (Beta)

This release benefits from an enhanced BlueField-3 NIC mode, currently in Beta. In contrast to BlueField-3 DPU mode, where offloading, acceleration, and isolation are all available, BlueField-3 NIC mode only offers acceleration features.

High-level overview of the internal infrastructure of a BlueField-3 DPU and highlights the East-West traffic moving across the DPU using the data path accelerator and accelerated programmable pipeline
Figure 4. BlueField-3 (enhanced) NIC mode

While continuing to leverage the BlueField low power and lower compute-intensive SKUs, the enhanced BlueField-3 NIC mode offers many advantages over the current ConnectX BlueField-2 NIC mode, including:

  • Higher performance and lower latency at scale using local DPU memory
  • Performant RDMA with Programmable Congestion Control (PCC)
  • Programmability with DPA and additional BlueField accelerators 
  • Robust platform security with device attestation and on-card BMC

Note that BlueField-3 NIC mode will be productized as a software mode, not a separate SKU, to enable future DPU-mode usage. As such, BlueField-3 NIC mode is a fully supported software feature available on all BlueField-3 SKUs. DPA programmability for any BlueField-3 DPU operating in NIC mode mandates the installation of DOCA on the host and an active host-based service.

Services

NVIDIA DOCA services are containerized DOCA-based programs that provide an end-to-end solution for a given use case. These services are accessible through NVIDIA NGC, from which they can be easily deployed directly to the DPU. DOCA 2.2 gives you greater control and now enables offline installation of DOCA services.

NGC offline service installation

DOCA services installed from NGC require Internet connectivity. However, many customers operate in a secure production environment without Internet access. Providing the option for ‘nonconnected’ deployment enables service installation in a fully secure production environment, simplifying the process and avoiding the already unlikely scenario whereby each server would need a connection to complete the installation process.

For example, consider the installation of DOCA Telemetry Service (DTS) in a production environment to support metrics collection. The full installation process is completed in just two steps:

  • Step 1: NGC download on connected server
  • Step 2: Offline installation using internal secure delivery  

Summary

NVIDIA DOCA 2.2 plays a pivotal and indispensable role in driving data center innovation and transforming cloud and enterprise data center networks for AI applications. By providing a comprehensive SDK and acceleration framework for BlueField DPUs, DOCA empowers developers with powerful libraries, drivers, and APIs, enabling the creation of high-performance applications and services.

With several new features and enhancements to DOCA 2.2, a number of immediate gains are available. In addition to the performance gains realized through DPU acceleration, the inclusion of DOCA-FlexIO and DOCA-PCC SDK offers developers accelerated computing for AI-centric benefits. These SDKs enable the creation of custom emulations and algorithms, reducing the time to market, and significantly improving the overall development experience.

Additionally, networking-specific updates to NVIDIA DOCA FLOW and OVS-DOCA offer simplified delivery pathways for software-defined networking and security solutions. These features increase efficiency and enhance visibility, scalability, and flexibility, essential for building sophisticated and secure infrastructures.

With wide-ranging contributions to data center innovation, AI application acceleration, and robust network infrastructure, DOCA is a crucial component of NVIDIA AI cloud services. As the industry moves towards more complex and demanding computing requirements, the continuous evolution of DOCA and integration with cutting-edge technologies will further solidify its position as a trailblazing platform for empowering the future of data centers and AI-driven solutions.

Download NVIDIA DOCA to begin your development journey with all the benefits DOCA has to offer. For more information, see the following resources:

Leave a Reply

Your email address will not be published. Required fields are marked *