Categories
Misc

Orchestrating Accelerated Virtual Machines with Kubernetes Using NVIDIA GPU Operator

Many organizations today run applications in containers to take advantage of the powerful orchestration and management provided by cloud-native platforms based…

Many organizations today run applications in containers to take advantage of the powerful orchestration and management provided by cloud-native platforms based on Kubernetes. However, virtual machines continue to remain as the predominant data center infrastructure platform for enterprises, and not all applications can be easily modified to run in containers. For example, applications requiring older operating systems, custom kernel modules, or specialized hardware require more effort to containerize.  

KubeVirt and OpenShift Virtualization are add-ons to Kubernetes that provide virtual machine (VM) management. These solutions eliminate the need to manage separate clusters for VM and container workloads. KubeVirt is a community-supported open source project, and it also serves as the upstream project for the OpenShift Virtualization feature from Red Hat.

NVIDIA GPUs have been accelerating applications that are virtualized for many years, and NVIDIA has also created technology to support GPU acceleration for containers managed by Kubernetes. The latest release of the NVIDIA GPU Operator adds support for KubeVirt and OpenShift Virtualization. Now, GPU-accelerated applications running as virtual machines can be orchestrated by Kubernetes too, just like ordinary enterprise applications, enabling unified management. 

GPUs in KubeVirt and OpenShift Virtualization

NVIDIA GPU Operator v22.9 enables GPU-accelerated containers and GPU-accelerated virtual machines, using either NVIDIA Virtual GPU (vGPU) or PCI passthrough, to run alongside each other in the same cluster. This version introduces new software components that support virtual machines. 

Additionally, the operator is responsible for providing automation to manage the deployment, configuration, and lifecycle of this software, easing the operational overhead on cluster administrators. More detailed information about these components is provided below. 

The vfio-pci driver (Virtual Function I/O) provides a secure user space driver that is needed when using a physical GPU for PCI passthrough. PCI passthrough presents the entire GPU as a PCI device to a virtual machine. When using PCI passthrough, the GPU cannot be shared, but provides the highest performance. 

The NVIDIA vGPU Manager is the driver installed on the hypervisor that enables NVIDIA Virtual GPU technology. NVIDIA vGPU enables multiple virtual machines to have simultaneous, time-based shared access to a single physical GPU.  

The NVIDIA vGPU Device Manager is responsible for interacting with the vGPU Manager and creating vGPU devices on the worker node. 

The NVIDIA KubeVirt device plug-in discovers and advertises both physical and NVIDIA vGPU devices to kubelet so that they can be requested and assigned to VMs. Kubelet is an agent running on every node in the cluster, responsible for communication between the node and the Kubernetes control plane.

Planning for deployment

Prior to deployment, it is important to be aware of some of the limitations. Currently, MIG-backed vGPU instances are not supported. Additionally, a given GPU worker node can only run GPU workloads of single type—containers, VMs with PCI passthrough, or VMs with NVIDIA vGPU—but not a combination. 

To enable this new functionality, set sandboxWorkloads.enabled to true in ClusterPolicy. When enabled, the GPU Operator will manage and deploy the new software components needed for supporting virtual machines. This option is disabled by default, meaning that the GPU Operator will only provision worker nodes for container workloads.

Administrators have the ability to control where workloads get deployed through the use of Kubernetes node labels. GPU Operator v22.9 introduces a new node label, nvidia.com/gpu.workload.config, which dictates which software components get deployed by the GPU Operator and consequently controls what type of GPU workloads a node supports. This node label can take on the values container, vm-passthrough, and vm-vgpu which correspond to the different workloads now supported. 

This concept allows administrators to have pools of machine types, each with different capabilities, and managed by a common control plane. If the nvidia.com/gpu.workload.config node label is not present on a GPU worker node, the GPU Operator will use the default workload type, which is configurable in ClusterPolicy through the sandboxWorkloads.defaultWorkload field.

Conclusion

GPU Operator v22.9 brings with it additional capabilities required to run GPU-powered workloads on Kubernetes with KubeVirt and OpenShift Virtualization. VMs in Kubernetes can attach GPU devices using PCI passthrough or NVIDIA vGPU. This flexibility speeds the adoption of cloud-native platforms by removing the need to refactor GPU-accelerated applications to support containerization. Administrators can continue to run these applications in VMs alongside other container native applications, with Kubernetes performing the orchestration. 

Getting started

To get started with GPU accelerated virtual machines, see the official documentation on Running KubeVirt VMs with the GPU Operator. Submit feedback and bug reports through the gpu-operator/issues GitHub repository. Contributions to the kubernetes/gpu-operator GitLab repository are also encouraged. 

Additional resources

Categories
Misc

Think Fast: Lotus Eletre Tops Charts in Driving and AI Compute Speeds, Powered by NVIDIA DRIVE Orin

One of the biggest names in racing is going even bigger. Performance automaker Lotus launched its first SUV, the Eletre, earlier this week. The fully electric vehicle sacrifices little in terms of speed and outperforms when it comes to technology. It features an immersive digital cockpit, lengthy battery range of up to 370 miles and Read article >

The post Think Fast: Lotus Eletre Tops Charts in Driving and AI Compute Speeds, Powered by NVIDIA DRIVE Orin appeared first on NVIDIA Blog.

Categories
Misc

GeForce RTX 40 Series Receives Massive Creator App Benefits This Week ‘In the NVIDIA Studio’

Artists deploying the critically acclaimed GeForce RTX 4090 GPUs are primed to receive significant performance boosts in key creative apps. Plus, a special spook-tober edition of In the NVIDIA Studio features two talented 3D artists and their Halloween-themed creations this week.

The post GeForce RTX 40 Series Receives Massive Creator App Benefits This Week ‘In the NVIDIA Studio’ appeared first on NVIDIA Blog.

Categories
Misc

Getting to Know Autonomous Vehicles

The future is autonomous, and AI is already transforming the transportation industry. But what exactly is an autonomous vehicle and how does it work? Autonomous…

The future is autonomous, and AI is already transforming the transportation industry. But what exactly is an autonomous vehicle and how does it work?

Autonomous vehicles are born in the data center. They require a combination of sensors, high-performance hardware, software, and high-definition mapping to operate without a human at the wheel. While the concept of this technology has existed for decades, production self-driving systems have just recently become possible due to breakthroughs in AI and compute.

Specifically, massive leaps in high-performance computing have opened new possibilities in developing, training, testing, validating, and operating autonomous vehicles. The Introduction to Autonomous Vehicles GTC session walks through these breakthroughs, how current self-driving technology works, and what’s on the horizon for intelligent transportation.

From the cloud

The deep neural networks that run in the vehicle are trained on massive amounts of driving data. They must learn how to identify and react to objects in the real world—an incredibly time-consuming and costly process.

A test fleet of 50 vehicles generates about 1.6 petabytes of data each day, which must be ingested, encoded, and stored before any further processing can be done.​​

Then, the data must be combed through to find scenarios useful for training, such as new situations or situations underrepresented in the current dataset. These useful frames typically amount to just 10% of the total collected data.

You must then label every object in the scene, including traffic lights and signs, vehicles, pedestrians, and animals, so that the DNNs can learn to identify them as well as checking for accuracy.

NVIDIA DGX data center solutions have made this onerous process into a streamlined operation by providing a veritable data factory for training and testing. With high-performance compute, you can automate the curation and labeling process, as well as run many DNN tests in parallel.

When a new model or set of models is ready to be deployed, you can then validate the networks by replaying the model against thousands of hours of driving scenarios in the data center. Simulation also provides the capability to test these models in the countless edge cases an autonomous vehicle could encounter in the real world.

NVIDIA DRIVE Sim is built on NVIDIA Omniverse to deliver a powerful, cloud-based simulation platform capable of generating a wide range of real-world scenarios for AV development and validation. ​It creates highly accurate, digital twins of real-world environments using precision map data.

Image of DRIVE Sim generating ground truth data of an intersection for autonomous vehicle testing.
Figure 1. NVIDIA DRIVE Sim provides a physically accurate digital twin of the world for comprehensive AV validation

It can run just the AV software, which is known as software-in-the-loop, or the software running on the same compute as it would in the vehicle for hardware-in-the-loop testing.​

You can truly tailor situations to your specific needs using the NVIDIA DRIVE Replicator tool, which can generate entirely new data. These scenarios include physically based sensor data, along with the corresponding ground truth, to ​complement real-world driving data and reduce the time and cost of development.

To the car

Validated deep neural networks run in the vehicle on centralized, high-performance AI compute.

Redundant and diverse sensors, including camera, radar, lidar, and ultrasonics, collect data from the surrounding environment as the car drives. The DNNs use this data to detect objects and infer information to make driving decisions.

Processing this data while running multiple DNNs concurrently requires an incredibly high-performance AI platform.

NVIDIA DRIVE Orin is a highly advanced, software-defined compute platform for autonomous vehicles. It achieves 254 trillion operations per second, enough to handle these functions while achieving systematic safety standards for public road operations.

A rendering of the DRIVE Orin system-on-a-chip.
Figure 2. NVIDIA DRIVE Orin is the current generation software-defined platform for centralized autonomous vehicle compute

In addition to DNNs for perception, AVs rely on maps with centimeter-level detail for accurate localization, which is the vehicle’s ability to locate itself in the world.  ​

Proper localization requires constantly updated maps that reflect current road conditions, such as a work zone or a lane closure, so vehicles can accurately measure distances in the environment. These maps must efficiently scale across AV fleets, with fast processing and minimal data storage. ​Finally, they must be able to function worldwide, so AVs can operate at scale.

NVIDIA DRIVE Map is a multimodal mapping platform designed to enable the highest levels of autonomy while improving safety.​ It combines survey maps built by dedicated mapping vehicles with AI-based crowdsourced mapping​ from customer vehicles. DRIVE Map includes four localization layers—camera, lidar, radar, and GNSS—providing the redundancy and versatility required by the most advanced AI drivers.​

Continuous improvement

The AV development process isn’t linear. As humans, we never stop learning, and AI operates in the same way.

Autonomous vehicles will continue to get smarter over time as the software is trained for new tasks, enhanced, tested, and validated, then updated to the vehicle over the air.

This pipeline is continuous, with data from the vehicle constantly being collected to continuously train and improve the networks, which are then fed back into the vehicle. AI is used at all stages of the real-time computing pipeline, from perception, mapping, and localization to planning and control.

This continuous cycle is what turns vehicles from their traditional fixed-function operation to software-defined devices. Most vehicles are as advanced as they will ever be at the point of sale. With this new software-defined architecture, automakers can continually update vehicles throughout their lives with new features and functionality.

Categories
Misc

Neural NETA: Automaker Selects NVIDIA DRIVE Orin for AI-Powered Vehicles

One of China’s popular battery-electric startups now has the brains to boot. NETA Auto, a Zheijiang-based electric automaker, this week announced it will build its future electric vehicles on the NVIDIA DRIVE Orin platform. These EVs will be software defined, with automated driving and intelligent features that will be continuously upgraded via over-the-air updates. This Read article >

The post Neural NETA: Automaker Selects NVIDIA DRIVE Orin for AI-Powered Vehicles appeared first on NVIDIA Blog.

Categories
Misc

Microsoft Experience Centers Display Scalable, Real-Time Graphics With NVIDIA RTX and Mosaic Technology

When customers walk into a Microsoft Experience Center in New York City, Sydney or London, they’re instantly met with stunning graphics displayed on multiple screens and high-definition video walls inside a multi-story building. Built to showcase the latest technologies, Microsoft Experience Centers surround customers with vibrant, immersive graphics as they explore new products, watch technical Read article >

The post Microsoft Experience Centers Display Scalable, Real-Time Graphics With NVIDIA RTX and Mosaic Technology appeared first on NVIDIA Blog.

Categories
Misc

Upcoming Event: Guide to Minimizing Jetson Disk Usage

Learn the steps for reducing disk usage on NVIDIA Jetson in this webinar on November 1.

Learn the steps for reducing disk usage on NVIDIA Jetson in this webinar on November 1.

Categories
Misc

Make Gaming a Priority: Special Membership Discount Hits GeForce NOW for Limited Time

This spook-tacular Halloween edition of GFN Thursday features a special treat: 40% off a six-month GeForce NOW Priority Membership — get it for just $29.99 for a limited time. Several sweet new games are also joining the GeForce NOW library. Creatures of the night can now stream vampire survival game V Rising from the cloud. Read article >

The post Make Gaming a Priority: Special Membership Discount Hits GeForce NOW for Limited Time appeared first on NVIDIA Blog.

Categories
Misc

Upcoming Event: Improve Your Cybersecurity Posture with AI

Find out how federal agencies are adopting AI to improve cybersecurity in this November 16 webinar featuring Booz Allen Hamilton.

Find out how federal agencies are adopting AI to improve cybersecurity in this November 16 webinar featuring Booz Allen Hamilton.

Categories
Misc

Explainer: What Is Edge AI and How Does It Work?

Edge AI is the deployment of AI applications in devices throughout the physical world. It’s called “edge AI” because the AI computation is done near the…

Edge AI is the deployment of AI applications in devices throughout the physical world. It’s called “edge AI” because the AI computation is done near the user at the edge of the network, close to where the data is located, rather than centrally in a cloud computing facility or private data center.