Categories
Misc

Accelerating Cloud-Ready Infrastructure and Kubernetes with Red Hat OpenShift and the NVIDIA BlueField DPU

An animated visualization of Red Hat Openshift running on the NVIDIA BlueField DPUTake a deep dive into the integrated cloud-ready infrastructure solution from Red Hat and NVIDIA An animated visualization of Red Hat Openshift running on the NVIDIA BlueField DPU

The IT world is moving to cloud, and cloud is built on containers managed with Kubernetes. We believe the next logical step is to accelerate this infrastructure with data processing units (DPUs) for greater performance, efficiency, and security.

Red Hat and NVIDIA are building an integrated cloud-ready infrastructure solution with the management and automation of Red Hat OpenShift combined with the acceleration, workload isolation, and security capabilities of NVIDIA BlueField DPUs.

Benefits of Red Hat OpenShift

Many popular cloud infrastructure projects use containers managed by Kubernetes. However, implementing Kubernetes can be a heavy lift, especially for organizations that cannot devote dedicated staff to becoming Kubernetes experts. 

Red Hat OpenShift provides a powerful set of capabilities for managing Kubernetes containers as well as application deployment, updates, and lifecycle management. OpenShift includes automation and security tools, as well as a supported open-source model to make cloud infrastructure more affordable, reliable, and scalable.

According to a 2021 Red Hat survey, Kubernetes is used for over 85% of container orchestration projects, and Red Hat OpenShift is the most popular choice for hybrid and multicloud Kubernetes deployments. OpenShift is the industry’s leading enterprise Kubernetes platform, used by more than 50% of commercial banks, telecommunications companies, and airlines on the Fortune 500.

It is clear that most enterprises want a supported Kubernetes model, and Red Hat OpenShift is one of the most popular choices.

How a DPU works

A DPU offloads, accelerates, and isolates infrastructure workloads from the server’s CPU. For example, the BlueField DPU can offload networking, network virtualization, data encryption, and time synchronization tasks from the CPU and run them on purpose-built silicon.

Other infrastructure software, such as remote management, firewall agents, network control plane, and storage virtualization, can run on BlueField’s Arm processor cores. Doing so frees up the server’s CPU cores that can instead run applications and tenant workloads.

This functionality also isolates infrastructure and security workloads in a separate domain. The result is a set of servers that run more applications with faster networking, increasing the efficiency and security of the data center. 

In a typical cloud infrastructure, the network traffic traverses both physical servers and containers running on these servers. This requires a packet switching solution within each server, and to gain maximum efficiency, the application containers need a way to talk to the accelerated networking offloads of the DPU.

The traditional way is to go through Kubernetes and Open Virtual Network (OVN) to access the Open Virtual Switch (Open vSwitch or OVS). OVN provides network abstraction and the default deployment strategy is to run both OVN and OVS on the host server’s CPU.

However, this method consumes a significant number of CPU cores as the network speeds increase beyond 10 Gbps. A solution is needed for Kubernetes to run the OVN and OVS functionality on the DPU so that all the packet switching, header rewrites, encapsulation/decapsulation, and packet filtering can be done on networking hardware instead of in software on the CPU. 

Increasing networking integration between Red Hat and NVIDIA

Red Hat and NVIDIA have collaborated to integrate the management power of OpenShift with the acceleration capabilities of the DPU.

The first stage of integration started in 2018 with Red Hat Enterprise Linux offloading network traffic to the NVIDIA ConnectX SmartNIC. The networking data plane–using OVS or DPDK–was running on the SmartNIC ASIC but the networking control plane was still running entirely in software on the X86 CPU.

This is a diagram of the OpenStack software-defined networking (SDN) components running in Red Hat Enterprise Linux and interacting via Open vSwitch (OVS) with the eSwitch in the NVIDIA ConnectX SmartNIC. This integration allows the eSwitch hardware to offload and accelerate the SDN data plane packet switching for virtual machines running in user space.
Figure 1. OpenStack SDN controller, running on Red Hat Enterprise Linux, offloads the networking data plane to the NVIDIA ConnectX SmartNIC through OVS while the control plane runs on the X86 CPU.

In 2021, the companies took the next step and deployed Red Hat OpenShift with the NVIDIA BlueField DPU and ran performance benchmark tests. At NVIDIA GTC 2021, we demonstrated the advantages of shifting networking to the DPU and published a post, Optimizing server utilization in data centers by offloading network functions to NVIDIA BlueField-2 DPUs.

In this solution, the networking data plane with overlay offload (OVS and Geneve Offload) and the networking control plane (in the OVN Kubernetes pod) were running on the DPU with Red Hat Enterprise Linux. The major OpenShift components, including Red Hat Enterprise Linux CoreOS remained on the x86 CPU.

This diagram shows Red Hat OpenShift with Kubernetes running on the x86 CPU and offloading both the open virtual networking (OVN) data plane and control plane to the BlueField-2 DPU. Red Hat Enterprise Linux CoreOS is running only on the x86 CPU as the DPU runs Red Hat Enterprise Linux. The tenant containers/pods on the x86 host offload their networking virtual functions to the DPU.
Figure 2. Red Hat OpenShift, running on Red Hat Enterprise Linux CoreOS, offloads both the networking data plane and control plane to the BlueField-2 DPU, via OVN and OVS. The DPU is running Red Hat Enterprise Linux on its Arm cores.

In the deployment scenario in Figure 2, the BlueField-2 does the heavy lifting in the following areas: 

  • Geneve (virtual overlay network) encapsulation/decapsulation 
  • IPsec encapsulation/decapsulation 
  • Encryption/decryption routing
  • Network address translation (NAT)

The host CPU and container see only simple unencapsulated, unencrypted packets and the CPU does not need to perform any of these tasks because they are offloaded to the DPU. This level of offload reduced CPU utilization by 70%, freeing up substantial CPU power on each server to run additional business/tenant workloads. 

Running OpenShift on the DPU

As presented at GTC 2022, Red Hat and NVIDIA have taken the next step, moving OpenShift, including Red Hat Enterprise Linux CoreOS, to run on the Arm cores of the BlueField DPU for the Red Hat OpenShift two cluster design that includes separate tenant and infrastructure clusters.

Red Hat Enterprise Linux CoreOS is the supported operating system for the OpenShift control plane, or master and worker nodes. This is the portion of OpenShift that performs scheduling, maintenance, upgrades, and cluster automation. It includes container management tools and security hardening to make it more resistant to hackers, and it now runs on both the host x86 CPU and on the DPU Arm cores.

BlueField DPUs running OpenShift OVS and OVN containers and Red Hat Enterprise Linux CoreOS on the various host servers form an infrastructure worker cluster. Meanwhile, OpenShift running on the x86 CPUs manages the tenant pods and clusters.

Offloading the OpenShift infrastructure cluster software to run on the BlueField Arm cores instead of on the host x86 cores provides additional x86 CPU savings, higher performance, and stronger security isolation.

Diagram shows that Red Hat OpenShift runs on both the host x86 CPUs and on the BlueField Arm cores. The X86 CPUs form an OpenShift tenant cluster while the DPUs on each server form an OpenShift infrastructure cluster.
Figure 3. Starting with Red Hat OpenShift 4.10, you can run OpenShift on both the x86 CPUs to manage the tenants and on the BlueField DPU Arm cores to manage the cluster infrastructure.

The cloud-native, software-defined networking is a good example of a BlueField DPU use case where OVN and OVS are running on and offloaded by the BlueField DPU in an OpenShift environment. Many other infrastructure services, such as network encryption, firewall agents, virtual routers, telemetry agents, and so on, can also be run on the DPU for an even greater benefit.

Significant cost savings benefits from OpenShift Offload on DPU 

To understand the impact of the DPU offloads on reducing the data center costs, NVIDIA and Red Hat put together a TCO model for a mid-sized data center with 51K servers. We considered this data center to be supporting 1M applications, each application needing 10K packets per second (PPS) of switching performance.

We considered two server deployment scenarios: with and without a DPU:

  • The server with no DPU running the virtual switching entirely in software achieved only 350k PPS.
  • The server with a DPU that offloads OVN and OVS to the DPU achieved 54x times higher performance of 18.7 million PPS per server.

Offloading virtual switching to the DPU also saved eight CPU cores per server. Based on this testing, the TCO model yielded amazing savings of $68.5M of CapEx. These savings are recognized by requiring 10K fewer DPU-enhanced servers due to much higher networking performance and CPU core savings per server.

We see power savings due to the smaller server footprint, which ultimately results in a better TCO model with the DPU-based servers. These TCO savings will get even better as we offload additional functions such as load balancers, firewalls, encryption, web servers, and so on to the DPUs, ultimately achieving amazing efficiency for cloud-ready data centers.

Solution roadmap and deploying OpenShift on BlueField 

The two-cluster OpenShift architecture running OpenShift on BlueField is now available as a developer preview or early trial in OpenShift 4.10, and is expected to become generally available in 2022.

But the NVIDIA and Red Hat teams aren’t stopping here. We are planning to test the offloading of network traffic encryption/decryption as that is a CPU-intensive task.

  • BlueField-2 DPU can offload IPsec encryption/decryption at up to 100 Gbps and TLS encryption/decryption at up to 200 Gbps.
  • BlueField-3 is expected to support IPSec, TLS and MACsec at even higher speeds.

Implementation of line-rate encryption offload from OpenShift to the DPU will improve data security for tenants and help you move closer to a zero-trust security stance.

Other potential integrations with the DPU include more sophisticated software-defined networking offloads, running a firewall agent on BlueField, precision time synchronization, video streaming with packet pacing, and using the DPU to collect telemetry data.

BlueField-2 DPUs are available now from NVIDIA and the BlueField-3 DPU will start sampling later in 2022. In addition, BlueField DPUs will soon be available for testing in the NVIDIA LaunchPad cloud service. 

If you would like to test or develop on Red Hat OpenShift running with the NVIDIA BlueField DPU, please indicate your interest

Summary

If your organization seeks to embrace cloud-native computing in data centers, the combination of NVIDIA BlueField DPUs, Red Hat Enterprise Linux, and Red Hat OpenShift provides an efficient and innovative open, hybrid-cloud platform with new security features. This powerful platform delivers hardware acceleration capabilities to run critical software-defined networking, storage, and security functions.

Now more server resources can be allocated to run cloud-native workloads, as well as traditional business applications.

For more information, see the following resources:  

Leave a Reply

Your email address will not be published. Required fields are marked *