Categories
Misc

Adding MIG, Preinstalled Drivers, and More to NVIDIA GPU Operator

The Network Operator and GPU Operators are installed side by side on a Kubernetes node, powered by the NVIDIA EGX software stack and NVIDIA-certified server hardware platformLearn about the latest GPU Operator releases which include support for multi-instance GPU Support, pre-installed NVIDIA drivers, Red Hat OpenShift 4.7, and more.The Network Operator and GPU Operators are installed side by side on a Kubernetes node, powered by the NVIDIA EGX software stack and NVIDIA-certified server hardware platform

Reliably provisioning servers with GPUs in Kubernetes can quickly become complex as multiple components must be installed and managed to use GPUs. The GPU Operator, based on the Operator Framework, simplifies the initial deployment and management of GPU servers. NVIDIA, Red Hat, and others in the community have collaborated on creating the GPU Operator.

To provision GPU worker nodes in a Kubernetes cluster, the following NVIDIA software components are required:

  • NVIDIA Driver
  • NVIDIA Container Toolkit
  • Kubernetes device plugin
  • Monitoring

These components should be provisioned before GPU resources are available to the cluster and managed during the cluster operation.

The GPU Operator simplifies both the initial deployment and management of the components by containerizing all components. It uses standard Kubernetes APIs for automating and managing these components, including versioning and upgrades. The GPU Operator is fully open source. It is available on NGC and as part of the NVIDIA EGX Stack and Red Hat OpenShift.

The latest GPU Operator releases, 1.6 and 1.7, include several new features:

  • Support for automatic configuration of MIG geometry with NVIDIA Ampere Architecture products
  • Support for preinstalled NVIDIA drivers and the NVIDIA Container Toolkit
  • Updated support for Red Hat OpenShift 4.7
  • Updated GPU Driver version to include support for NVIDIA A40, A30, and A10
  • Support for RuntimeClasses with Containerd

Multi-Instance GPU support

Multi-Instance GPU (MIG) expands the performance and value of each NVIDIA A100 Tensor Core GPU. MIG can partition the A100 or A30 GPU into as many as seven instances (A100) or four instances (A30), each fully isolated with their own high-bandwidth memory, cache, and compute cores.

Without MIG, different jobs running on the same GPU, such as different AI inference requests, compete for the same resources, such as memory bandwidth. With MIG, jobs run simultaneously on different instances, each with dedicated resources for compute, memory, and memory bandwidth. This results in predictable performance with quality of service and maximum GPU utilization. Because simultaneous jobs can operate, MIG is ideal for edge computing use cases.

GPU Operator 1.7 added a new component called NVIDIA MIG Manager for Kubernetes, which runs as a DaemonSet and manages MIG mode and MIG configuration changes on each node. You can apply MIG configuration on the node by adding a label that indicates the predefined configuration name to be applied. After applying MIG configuration, GPU Operator automatically validates that MIG changes are applied as expected. For more information, see GPU Operator with MIG.

Figure 1. MIG Manager for Kubernetes manages MIG configuration for GPU Operator

Preinstalled drivers and Container Toolkit

GPU Operator 1.7 now supports selectively installing NVIDIA Driver and Container Toolkit (container config) components. This new feature provides great flexibility for environments where the driver or nvidia-docker2 packages are preinstalled. These environments can now use GPU Operator for simplified management of other software components like Device Plugin, GPU Feature Discovery Plugin, DCGM Exporter for monitoring, or MIG Manager for Kubernetes.

Install command with only the drivers preinstalled:

 helm install --wait --generate-name 
      nvidia/gpu-operator 
      --set driver.enabled=false 

Install command with both drivers and nvidia-docker2 preinstalled:

 helm install --wait --generate-name 
      nvidia/gpu-operator 
      --set driver.enabled=false
      --set toolkit.enabled=false 

Added support for Red Hat OpenShift

We continue our line of support for Red Hat OpenShift,

  • GPU Operator 1.6 and 1.7 include support for the latest Red Hat OpenShift 4.7 version.
  • GPU Operator 1.5 supports Red Hat OpenShift 4.6.
  • GPU Operator 1.4 and 1.3 support Red Hat OpenShift 4.5 and 4.4, respectively.

GPU Operator is an OpenShift certified operator. Through the OpenShift web console, you can install and start using the GPU Operator with only a few mouse clicks. Being a certified operator makes it significantly easier for you to use NVIDIA GPUs with Red Hat OpenShift.

GPU Driver support for NVIDIA A40, A30, and A10

We updated the GPU Driver version to include support for NVIDIA A40, A30, and A10.

NVIDIA A40

The NVIDIA A40 delivers the data center-based solution that designers, engineers, artists, and scientists need for meeting today’s challenges. Built on the NVIDIA Ampere Architecture, the A40 combines the latest generation RT Cores, Tensor Cores, and CUDA Cores. It has 48 GB of graphics memory for unprecedented graphics, rendering, compute, and AI performance. From powerful virtual workstations accessible from anywhere, to dedicated render and compute nodes, the A40 is built to tackle the most demanding visual computing workloads from the data center.

For more information, see NVIDIA A40.

NVIDIA A30

The NVIDIA A30 Tensor Core GPU is the most versatile mainstream compute GPU for AI inference and enterprise workloads. Tensor Cores with MIG combine with fast memory bandwidth in a low 165W power envelope, all in a PCIe form factor ideal for mainstream servers.

Built for AI inference at scale, A30 can also rapidly retrain AI models with TF32 as well as accelerate HPC applications using FP64 Tensor Cores. The combination of the NVIDIA Ampere Architecture Tensor Cores and MIG delivers speedups securely across diverse workloads, all powered by a versatile GPU enabling an elastic data center. The versatile A30 compute capabilities deliver maximum value for mainstream enterprises.

For more information, see NVIDIA A30.

NVIDIA A10

The NVIDIA A10 Tensor Core GPU is the ideal GPU for mainstream media and graphics with AI. Second-generation RT Cores and third-generation Tensor Cores enrich graphics and video applications with powerful AI. NVIDIA A10 delivers a single-wide, full-height, full-length PCIe form factor and a 150W power envelope for dense servers.

Built for graphics, media, and cloud gaming applications with powerful AI capabilities, the NVIDIA A10 Tensor Core GPU can deliver rich media experiences. It delivers up to 4k for cloud gaming, with 2.5x the graphics and over 3x the inference performance compared to the NVIDIA T4 Tensor Core GPU.

For more information, see NVIDIA A10.

RuntimeClass support with Containerd

RuntimeClass provides you with the flexibility of choosing the container runtime configuration per Pod and then applying the default runtime configuration for all Pods on each node. With this support, you can specify the specific runtime configuration for Pods running GPU-accelerated workloads and choose other runtimes for generic workloads.

GPU Operator v1.7.0 now supports auto creation of nvidia RuntimeClass when default runtime is selected as containerd during installation.  You can explicitly specify this RuntimeClass name when running applications consuming GPUs.

 apiVersion: node.k8s.io/v1beta1
 handler: nvidia
 kind: RuntimeClass
 metadata:
  labels:
    app.kubernetes.io/component: gpu-operator
  name: nvidia 

Summary

To start using NVIDIA GPU Operator today, see the following resources:

Categories
Misc

Looking Behind the Curtain of EVPN Traffic Flows

Is EVPN magic? As Arthur C Clarke said, any sufficiently advanced technology is indistinguishable from magic. On that premise, moving from a traditional layer 2 environment to VXLAN driven by EVPN has much of that same hocus-pocus feeling. To help demystify the sorcery, I aim to help users new to EVPN understand how EVPN works … Continued

Is EVPN magic? As Arthur C Clarke said, any sufficiently advanced technology is indistinguishable from magic. On that premise, moving from a traditional layer 2 environment to VXLAN driven by EVPN has much of that same hocus-pocus feeling.

To help demystify the sorcery, I aim to help users new to EVPN understand how EVPN works and how the control plane converges. In this post, I focus on basic layer 2 (L2) building blocks then work my way up to layer 3 (L3) connectivity and the control plane.

I use the reference topology as the cable plan and foundation to build your understanding of the traffic flow. The infrastructure tries to demystify a symmetric-mode EVPN environment using distributed gateways. All configurations are standardized using the production-ready automation and linked in the publicly available cumulus_ansible_modules GitLab repo.

To follow along, build your own Cumulus in the Cloud and deploy the following playbook:

~$ git clone https://gitlab.com/cumulus-consulting/goldenturtle/cumulus_ansible_modules.git
  
 Cloning into 'cumulus_ansible_modules'...
 remote: Enumerating objects: 822, done.
 remote: Counting objects: 100% (822/822), done.
 remote: Compressing objects: 100% (374/374), done.
 remote: Total 4777 (delta 416), reused 714 (delta 340), pack-reused 3955
 Receiving objects: 100% (4777/4777), 4.64 MiB | 22.64 MiB/s, done.
 Resolving deltas: 100% (2121/2121), done.
  
 ~$
 ~$ cd cumulus_ansible_modules/
 ~/cumulus_ansible_modules$ ansible-playbook -i inventories/evpn_symmetric/host playbooks/deploy.yml 

EVPN message types

Like any good protocol, EVPN has a robust process for exchanging information with its peers:  message types. If you already know OSPF and the LSA messages, you can think of EVPN message types as similar. Each EVPN message type can carry a different kind of information about the EVPN traffic flow.

There are about five different message types. In this post, I focus on the two most popular types for now: Type 2 MAC and Type 2 MAC/IP information.

Digging into EVPN message types: Type 2

The easiest EVPN messages to understand are type 2. As mentioned earlier, type 2 routes contain MAC and MAC/IP mappings. To start off, inspect a type 2 entry at work. To do that, you can verify basic connectivity from leaf01 to the server01.

First, look at the bridge table to make sure that the MAC address of the switch has the correct mapping to the correct port for the server.

Get the Server01 MAC address:

cumulus@server01:~$ ip address show
 ...
 5: uplink:  mtu 9000 qdisc noqueue state UP group default qlen 1000
      link/ether 44:38:39:00:00:32 brd ff:ff:ff:ff:ff:ff
      inet 10.1.10.101/24 scope global uplink
      valid_lft forever preferred_lft forever
      inet6 fe80::4638:39ff:fe00:32/64 scope link
      valid_lft forever preferred_lft forever 

Look at Leaf01’s bridge table to make sure the MAC address is mapped to the port that you expect. Cross reference it with LLDP:

cumulus@server01:~$ ip address show
 ...
 5: uplink:  mtu 9000 qdisc noqueue state UP group default qlen 1000
      link/ether 44:38:39:00:00:32 brd ff:ff:ff:ff:ff:ff
      inet 10.1.10.101/24 scope global uplink
      valid_lft forever preferred_lft forever
      inet6 fe80::4638:39ff:fe00:32/64 scope link
      valid_lft forever preferred_lft forever
 Look at Leaf01’s bridge table to make sure the MAC address is mapped to the port that you expect. Cross reference it with LLDP:
  
 cumulus@leaf01:mgmt:~$ net show bridge macs
  
 VLAN       Master  Interface  MAC                TunnelDest  State     Flags            LastSeen
 --------  ------  ---------  -----------------  ----------  ---------  ------------------  --------
 ...
 10         bridge  bond1   46:38:39:00:00:32                                           swp1       1G   BondMember  server01.simulation  44:38:39:00:00:32
 swp2       1G   BondMember  server02             44:38:39:00:00:34
 swp3       1G   BondMember  server03             44:38:39:00:00:36
 swp49      1G   BondMember  leaf02               swp49
 swp50      1G   BondMember  leaf02               swp50
 swp51      1G   Default    spine01               swp1
 swp52      1G   Default    spine02               swp1
 swp53      1G   Default    spine03               swp1
 swp54      1G   Default    spine04               swp1
 Checking the ARP table, you can validate that the MAC and IP addresses are mapped correctly.
  
 cumulus@leaf01:mgmt:~$ net show neighbor
 Neighbor                   MAC             Interface       AF   STATE
 -------------------------  -----------------  -------------  ----  ---------
 ...
 10.1.10.101                44:38:39:00:00:32  vlan10        IPv4  REACHABLE
 ... 

Now that you’ve checked the basics, start looking at how this gets pulled into EVPN. Validate the local VNIs that are configured:

cumulus@leaf01:mgmt:~$ net show evpn vni
 VNI        Type VxLAN IF              # MACs   # ARPs   # Remote VTEPs  Tenant VRF
 20         L2   vni20                 9     2     1               RED
 30         L2   vni30                 10    2     1               BLUE
 10         L2   vni10                 11    4     1               RED
 4001       L3   vniRED                2     2     n/a             RED
 4002       L3   vniBLUE               1     1     n/a             BLUE 

Because you validated that server01 is mapped to vlan10 as per the bridge mac table, you now check if the IP neighbor entries are being pulled into the EVPN cache. This cache describes the information that is being exchanged with the other EVPN speakers in the environment.

cumulus@leaf01:mgmt:~$ net show evpn arp-cache vni 10
 Number of ARPs (local and remote) known for this VNI: 4
 Flags: I=local-inactive, P=peer-active, X=peer-proxy
 Neighbor              Type   Flags State    MAC             Remote ES/VTEP              Seq #'s
 ...
 10.1.10.101           local      active   44:38:39:00:00:32                             0/0
 10.1.10.104                 remote             active   44:38:39:00:00:3e 10.0.1.34                  

Here’s what you’ve got so far. The L2 connectivity works correctly as the L2 bridge table and L3 neighbor table are populated locally on leaf01. Next, you verified that the mac and IP information are being properly pulled into EVPN through the EVPN ARP cache.

Using this information, you can check the RD and RT mapping so that you can learn more about the full VNI advertisement.

An RD is a route distinguisher. It’s used to disambiguate EVPN routes in different VNIs, as they may have the same MAC or IP address.

The RTs are route targets. They are used to describe the VPN membership for the route, specifically which VRFs are exporting and importing the different routes in the infrastructure.

 cumulus@leaf01:mgmt:~$ net show bgp l2vpn evpn vni
 Advertise Gateway Macip: Disabled
 Advertise SVI Macip: Disabled
 Advertise All VNI flag: Enabled
 BUM flooding: Head-end replication
 Number of L2 VNIs: 3
 Number of L3 VNIs: 2
 Flags: * - Kernel
   VNI      Type RD                    Import RT                  Export RT                  Tenant VRF
 * 20       L2   10.10.10.1:2          65101:20                   65101:20              RED
 * 30       L2   10.10.10.1:4          65101:30                   65101:30              BLUE
 * 10       L2   10.10.10.1:3          65101:10                   65101:10              RED
 * 4001     L3   10.10.10.1:5          65101:4001                 65101:4001            RED
 * 4002     L3   10.10.10.1:6          65101:4002                 65101:4002            BLUE 

Because the local L2 VNI has RD 10.255.255.11:2, the RD is essentially an identifier for all routes that are exchanged by this node. When looking elsewhere in the fabric, you use that information to see all the routes advertised by leaf01.

 cumulus@leaf01:mgmt:~$ net show bgp l2vpn evpn route rd 10.10.10.1:3
 EVPN type-1 prefix: [1]:[ESI]:[EthTag]:[IPlen]:[VTEP-IP]
 EVPN type-2 prefix: [2]:[EthTag]:[MAClen]:[MAC]
 EVPN type-3 prefix: [3]:[EthTag]:[IPlen]:[OrigIP]
 EVPN type-4 prefix: [4]:[ESI]:[IPlen]:[OrigIP]
 EVPN type-5 prefix: [5]:[EthTag]:[IPlen]:[IP]         
                
 BGP routing table entry for 10.10.10.1:3:UNK prefix
 Paths: (1 available, best #1)
   Advertised to non peer-group peers:
   leaf02(peerlink.4094) spine01(swp51) spine02(swp52) spine03(swp53) spine04(swp54)
   Route [2]:[0]:[48]:[44:38:39:00:00:32] VNI 10/4001
   Local
      10.0.1.12 from 0.0.0.0 (10.10.10.1)
      Origin IGP, weight 32768, valid, sourced, local, bestpath-from-AS Local, best (First path received)
      Extended Community: ET:8 RT:65101:10 RT:65101:4001 Rmac:44:38:39:be:ef:aa
      Last update: Tue May 18 11:41:45 2021
 BGP routing table entry for 10.10.10.1:3:UNK prefix
 Paths: (1 available, best #1)
   Advertised to non peer-group peers:
   leaf02(peerlink.4094) spine01(swp51) spine02(swp52) spine03(swp53) spine04(swp54)
   Route [2]:[0]:[48]:[44:38:39:00:00:32]:[32]:[10.1.10.101] VNI 10/4001
   Local
      10.0.1.12 from 0.0.0.0 (10.10.10.1)
      Origin IGP, weight 32768, valid, sourced, local, bestpath-from-AS Local, best (First path received)
      Extended Community: ET:8 RT:65101:10 RT:65101:4001 Rmac:44:38:39:be:ef:aa
      Last update: Tue May 18 11:44:38 2021
  
 ....
  
 Displayed 8 prefixes (8 paths) with this RD 

Here’s an important piece of information. There are two different forms that a type 2 route can take. In this case, you’re sending each of the two types:

  • Type 2 MAC Route: It only includes a 48-byte MAC entry. This entry is pulled in directly from the bridge table and only has L2 information in it. Anytime a MAC address is learned in the bridge table, that MAC address is pulled into EVPN as a type 2 MAC route.
               
  • Type 2 MAC/IP Route: These entries are pulled into EVPN from the ARP table. Reading this entry, the first section includes MAC address and the second one is a mapping for the IP address and mask. The mask for the IP address is a /32. As this is pulled from the ARP table, all EVPN routes are pulled in as host routes.
 BGP routing table entry for 10.10.10.1:3:UNK prefix
 ...
   Route [2]:[0]:[48]:[44:38:39:00:00:32] VNI 10/4001
 …
  
 BGP routing table entry for 10.10.10.1:3:UNK prefix
 ...
   Route [2]:[0]:[48]:[44:38:39:00:00:32]:[32]:[10.1.10.101] VNI 10/4001
 ... 

Using this information, you can validate that this /32 host route for server01 is in the routing table of leaf03 as a pure L3 route, pointing out to the L3VNI.

 cumulus@leaf01:mgmt:~$ net show route vrf RED
 show ip route vrf RED
 ======================
 Codes: K - kernel route, C - connected, S - static, R - RIP,
      O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
      T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
      F - PBR, f - OpenFabric,
      > - selected route, * - FIB route, q - queued, r - rejected, b - backup
      t - trapped, o - offload failure
  
 VRF RED:
 K>* 0.0.0.0/0 [255/8192] unreachable (ICMP unreachable), 00:18:17
 C * 10.1.10.0/24 [0/1024] is directly connected, vlan10-v0, 00:18:17
 C>* 10.1.10.0/24 is directly connected, vlan10, 00:18:17
 B>* 10.1.10.104/32 [20/0] via 10.0.1.34, vlan4001 onlink, weight 1, 00:18:05
 C * 10.1.20.0/24 [0/1024] is directly connected, vlan20-v0, 00:18:17
 C>* 10.1.20.0/24 is directly connected, vlan20, 00:18:17
 B>* 10.1.30.0/24 [20/0] via 10.0.1.255, vlan4001 onlink, weight 1, 00:18:04 

Spend some time dissecting this output. The neighbor entry in Leaf01 for Server01 has made it all the way to Leaf03 as a /32 host route where the next hop is leaf01 but through the L3VNI.

To validate that the connection between the L2 VNI and the L3 VNI are accomplished successfully, examine the L3 VNI:

 cumulus@leaf01:mgmt:~$ net show evpn vni 4001
 VNI: 4001
   Type: L3
   Tenant VRF: RED
   Local Vtep Ip: 10.0.1.12
   Vxlan-Intf: vniRED
   SVI-If: vlan4001
   State: Up
   VNI Filter: none
   System MAC: 44:38:39:be:ef:aa
   Router MAC: 44:38:39:be:ef:aa
   L2 VNIs: 10 20          

In this output, the L3 VNI of 4001 is mapped to VRF RED, which you validated in the output of net show evpn vni 10. Using this, you also can see that VNI 10 is mapped to VRF 4001 through VLAN 4001. All the outputs that you’re seeing are lining up to indicate that you have a full working EVPN Type 2 VXLAN infrastructure.

Summary

There you have it. From start to finish, you saw how EVPN works for Type 2–based routes. Specifically, I discussed the different EVPN message types and how control planes converge in an L2 extension environment. It’s not witchcraft, just good technology.

For more information about extending the EVPN control plane demystification and tackling the traffic flows around Type 5 messages and VXLAN routing, see [LINK]. If you haven’t already, I highly recommend trying this out for yourself with NVIDIA Cumulus in the Cloud. If you’d like to take a deeper dive, we’ve put together a hub of EVPN content, from whitepapers to videos.

Categories
Misc

Using VXLAN Routing with EVPN Through Asymmetric or Symmetric Models

This posts compares asymmetric and symmetric EVPN routing models using EVPN as the control plane. It provides architecture differences and maps them to specific NOS CLI output for educational purposes.

We all know and love EVPN as a control plane for VXLAN tunnels over a layer 3 infrastructure. EVPN enables you to deploy VXLAN tunnels without controllers. Plus, it offers a range of other benefits, such as reduction of data center traffic through ARP suppression, quick convergence during mobility, one routing protocol for both underlay and overlay, and the inherent ability to support multitenancy.

So EVPN for VXLAN for all your layer 2 needs, right? Well, it’s a little more complicated than that. You might also have to communicate between VXLANs and between a VXLAN tunnel and the outside world, so VXLAN routing must also be enabled in the network, which I cover in this post.

VXLAN routing can be performed with one of two architectures:

  • Centralized routing performs all the VXLAN routing on one or two centralized routers, which can cause additional east-west traffic in the data center.
  • Distributed routing provides the VXLAN routing closest to the hosts on the directly connected leaf switches, which simplifies the traffic flow.

This is where VXLAN routing with EVPN comes in. BGP EVPN is used to communicate the VXLAN layer 3 routing information to the leaves.

Using the distributed architecture, the IETF defines two models to accomplish intersubnet routing with EVPN: asymmetric integrated routing and bridging (IRB) and symmetric IRB. Some vendors offer a symmetric model and others offer an asymmetric model.

At NVIDIA networking, we believe that you control your own network. Both models have value, depending on how your network is set up and who might have built your legacy network systems. We offer both solutions so that you can choose whichever method is right for your network.

Difference between asymmetric and symmetric models

The main difference between the asymmetric IRB model and symmetric IRB model is how and where the routing lookups are done. This results in differences concerning which VNI the packet travels on through the infrastructure. Because of these differences, there are variations in how they must be configured on the switch and how they are deployed in your network.

Asymmetric model

The asymmetric model enables routing and bridging on the VXLAN tunnel ingress, but only bridging on the egress. This results in bidirectional VXLAN traffic traveling on different VNIs in each direction (always the destination VNI) across the routed infrastructure.

Traffic is routed on encapsulation and switched on decapsulation.
Figure 1. Asymmetric VXLAN traffic flow

Consider the example from earlier. Host A wants to communicate with Host B, which is located on a different VLAN and a different rack, thus reachable through a different VNI.

  • As Host B is on a different subnet from Host A, Host A sends the frame to its default gateway, which is Leaf01. This is generally an Anycast Gateway.
  • Leaf01 recognizes that the destination MAC address is itself, looks up the routing table and routes the packet to the Green VNI while still on Leaf01.
  • Leaf01 then tunnels the frame in the Green VNI to Leaf02.
  • Leaf02 removes the VXLAN header from the frame, and bridges the frame to Host B.
  • Likewise, the return traffic would behave similarly.
  • Host B sends a frame to Leaf02.
  • Leaf02 recognizes its own destination MAC address and routes the packet to the Orange VNI on Leaf02.
  • The packet is tunneled within the Orange VNI to Leaf01.
  • Leaf01 removes the VXLAN header from the frame and bridges it to Host A.

With the asymmetric model, all the required source and destination VNIs (for example, orange and green) must be present on each leaf, even if that leaf doesn’t have a host in that VLAN in its rack. This may increase the number of IP/MAC addresses that the leaf must hold, which results in somewhat limited scale. However, in many instances, all VNIs in the network are configured on all leaves anyway to allow VM mobility and to simplify configuration of the whole network. In this case, the asymmetric model is desirable.

While it is not hugely scalable, deployment with the asymmetric model is a simple solution, as no additional VNIs or VLANs must be configured. Additionally, fewer routing hops occur to communicate between VXLANs, which results in lower latency.

Where multitenancy is required, each set of VLANs can also be placed into separate VRFs and routed between the VLANs within a VRF.

Symmetric model

The symmetric model routes and bridges on both the ingress and the egress leaves. This results in bidirectional traffic being able to travel on the same VNI, hence the symmetric name.

However, a new specialty transit VNI is used for all routed VXLAN traffic, called the L3VNI. All traffic that must be routed is routed onto the L3VNI, tunneled across the layer 3 infrastructure, routed off the L3VNI to the appropriate VLAN, and ultimately bridged to the destination.

Traffic is routed on encapsulation and routed on decapsulation
Figure 2. Symmetric VXLAN traffic flow

Now consider the scenario with a symmetric model (Figure 2). Host A on VLAN A must communicate with Host B on VLAN B.

  • Because the destination is a different subnet from Host A, Host A sends the frame to its default gateway, which is Leaf01.
  • Leaf01 recognizes that the destination MAC address is itself and uses the routing table to route the packet to the L3VNI and the next hop Leaf02.
  • The VXLAN-encapsulated packet has the egress leaf’s MAC as the destination MAC address and this L3VNI as the VNI.
  • Leaf02 performs VXLAN decapsulation and recognizes that the destination MAC address is itself and routes the packet on to the destination VLAN, to reach the destination host.
  • The return traffic is routed similarly over the same L3VNI.

With symmetric model, the leaf switches only need to host the VLANs and the corresponding VNIs that are located on its rack, as well as the L3VNI and its associated VLAN. This is because the ingress leaf switch doesn’t need to know the destination VNI.

The ability to host only the local VNIs (plus one extra) helps with scale. However, the configuration is more complex as an extra VXLAN tunnel and VLAN in your network are required. The data plane traffic is also more complex as an extra routing hop occurs and could cause extra latency.

Multitenancy requires one L3VNI per VRF, and all switches participating in that VRF must be configured with the same L3VNI. The L3VNI is used by the egress leaf to identify the VRF in which to route the packet.

Which IRB model is right?

The hardest part of choosing an IRB model is knowing the difference between symmetric and asymmetric methods. Now that you know the difference, you can make an informed decision regarding the best option for your network.

Generally, if you configure all VLANs, subnets, or VNIs on all leaves anyway (for mobility or ease of configuration), the asymmetric model is for you. It’s simpler to configure and doesn’t require extra VNIs to troubleshoot. It may even have slightly less latency.

The asymmetric model also works well if your data center can be broken down into Pods with VLANs and subnets contained in a Pod. Each leaf within the Pod is configured with all VLANs and subnets or VNIs in that local Pod. Other Pods and external networks are reachable through EVPN external routes. EVPN external routing with the asymmetric model is supported in Cumulus Linux 3.6 release, using the L3VNI for external routing only.

If your VLANs, subnets, or VNIs are widely dispersed or provisioned on the fly, choose the symmetric model. The symmetric model supports reachability to external networks with Cumulus Linux 3.5.

NVIDIA believes that you own and control your network, not a proprietary vendor, so we provide both solutions and enable you to choose.

Categories
Misc

Top 5 Ray Tracing Sessions for Graphics Developers from GTC 21

Industry luminaries joined us to introduce the fundamentals of real-time ray tracing, and how current developers such as Autodesk, Dassault, Chaos and ESI have integrated ray traced technologies into their most popular apps.

Engineers, product developers and designers around the world attended GTC to experience the latest NVIDIA solutions that are accelerating interactive rendering and simulation workflows in real time.

We showcased a wide variety of NVIDIA-powered ray tracing technologies and features that provide more realistic visualizations for artists and designers worldwide. Industry luminaries joined us at GTC to introduce the fundamentals of real-time ray tracing and how current developers such as Autodesk, Dassault, Chaos and ESI have integrated ray traced technologies into their most popular applications.

All of these GTC sessions are now available through NVIDIA On-Demand, so learn more about ray tracing and catch up on the latest advancements in professional content creation, from real-time ray traced shadows to real-time denoising.

The developer resources listed below are exclusively available to NVIDIA Developer Program members. Join today for free to get access to the tools and training necessary to build on NVIDIA’s technology platform here.

On-Demand Sessions

Ray Tracing in One Weekend
Pete Shirley gets you started on the fundamentals of ray tracing.

Incorporating Real-Time Ray Tracing in Autodesk’s Next-Generation Viewport System
Learn how Autodesk radically improves the quality and performance of their viewport experience by leveraging DXR and Vulkan Ray Tracing.

Real-Time Ray-Traced Effects for CAD: A Developer Story
Hear from Dassault Systèmes on how real-time ray traced shadows enhances the design review workflow for CATIA CAD users.

From Production Rendering with V-Ray GPU to Real-Time Ray Tracing with Chaos Vantage
Get an exclusive peek on the latest advancements in V-Ray and Chaos Vantage.

Not Just for Games: Applying NVIDIA Real-Time Denoisers in Advanced Immersive Virtual Prototyping
See how ESI group is computing physically correct, high-quality ambient occlusion and soft shadows for the most complex CAD models.

Check out all the ray tracing sessions from GTC, now available for free on NVIDIA On-Demand.

Categories
Misc

New on NGC: NVIDIA Maxine, NVIDIA TLT 3.0, Clara Train SDK 4.0, PyTorch Lightning and Vyasa Layar

The NVIDIA NGC catalog is a hub of highly performant software containers, pre-trained models, industry specific SDKs and Helm charts you can simplify and accelerate your end-to-end workflows.

The NVIDIA NGC catalog is a hub of GPU-optimized deep learning, machine learning and HPC applications. With highly performant software containers, pre-trained models, industry specific SDKs and Helm charts you can simplify and accelerate your end-to-end workflows. 

The NVIDIA NGC team works closely with our internal and external partners to update the content in the catalog on a regular basis. Below are some of the highlights: 

NVIDIA Maxine 

NVIDIA Maxine is a GPU-accelerated SDK with state-of-the-art AI features for developers to build virtual collaboration and content creation solutions, including video conferencing and streaming applications. You can add any of Maxine’s AI effects – Video, Audio, and Augmented Reality – into your existing application or develop a new pipeline from scratch.

Maxine’s Video Effects SDK and Audio Effects SDK are now available through the Maxine collection on the NGC catalog that includes a container for each SDK:

  • Video Effects SDK container enables video quality enhancement such as super resolution, reducing compression artifacts and video degradation caused by low light conditions or lower-quality cameras.
  • Audio Effects SDK container removes reverberations due to talking in low sound absorption spaces and reduces over 25 different unwanted background noise profiles such as keyboard typing, mouse-clicking, and fan noise.

Clara Train SDK 4.0 

Clara Train v4.0 is now powered by MONAI, a domain-specialized open-source PyTorch framework, accelerating deep learning in Healthcare imaging. 

The latest version also expands into Digital Pathology and introduces homomorphic encryption for server side aggregation in federated learning.

Transfer Learning Toolkit (TLT)

The NVIDIA Transfer Learning Toolkit (TLT) is the AI toolkit that abstracts away the AI/DL framework complexity and leverages high quality pre-trained models to enable you to build production quality models faster with only a fraction of data required. 

Version 3.0 of TLT is now available for computer vision and conversational AI use cases. Get started today by exploring the TLT collections for: 

Deep Learning and Inference 

Our most popular deep learning frameworks for training and inference have also been updated to the latest 21.02 version

Partner Software

  • PyTorch Lightning, developed by Grid.AI, allows you to leverage multiple GPUs and state-of-the-art training features such as 16-bit precision, early stopping, logging, pruning and quantization, while enabling faster iteration and reproducibility for your AI research. 
  • Vyasa’s suite of biomedical analytics allows users to derive insights from analytical modules including question answering, named entity recognition, PDF table extraction and image classification, irrespective of where that data resides.
Categories
Misc

Experience the Latest Breakthroughs in Game Development with NVIDIA at GDC

The Game Developer Conference (GDC) is here, and NVIDIA will be showcasing how our latest technologies are driving the future of game development and graphics. Check out our list of sessions now.

The Game Developer Conference (GDC) is here, and NVIDIA will be showcasing how our latest technologies are driving the future of  game development and graphics.

From NVIDIA Deep Learning Super Sampling (DLSS) to RTX Global Illumination (RTXGI), our latest tools and technologies are helping game developers create realistic and stunning virtual worlds for gamers. Attendees will also get an exclusive look at how NVIDIA Omniverse, the open platform for virtual collaboration and simulation, is helping developers accelerate production workflows. 

And don’t miss our sessions at GDC:

Collaborative Game Development with NVIDIA Omniverse

Get an inside look at all the collaboration tools available in Omniverse. Explore the platform’s ability to connect popular tools and applications, including Epic Games’ Unreal Engine 4, Autodesk Maya and 3ds Max, and Substance by Adobe. 

NvRTX Artist Guide

Learn more about the NVIDIA RTX Unreal Engine Branch (NvRTX), and discover technologies such as RTXDI, RTXGI, new denoisers like Relax, ray-traced volumetrics and tools like the BVH viewer. See demonstrations on how complex settings like a jungle or museum can be built with NvRTX, and get a better understanding of how AAA real-time ray tracing visuals are created.

NVIDIA DLSS Overview & Game Integrations

This session will cover the technology that makes DLSS possible. Learn how to integrate DLSS into a new game engine. Graphic programmers, technical artists and technical directors are encouraged to join this session so they can learn more about the engine requirements for DLSS and pick up general DLSS debugging tools.

The Technology Behind NvRTX

Game developers can dive into the NvRTX family of branches, and learn how to bring enhanced ray tracing support to Unreal Engine 4. This session will cover several challenges developers can encounter when working to deploy ray tracing in a game environment. Join us and explore how NVIDIA has crafted solutions for these challenges within the context of a curated branch of UE4.

DevTools for Harnessing Ray Tracing in Games

Students, technical artists, programmers and developers can experience ray tracing at interactive framerates with DXR and Vulkan Ray Tracing. Join this session to check out some of the available tools and features that developers can use to take advantage of NVIDIA GPUs and improve the graphics in games.

Enter for a Chance to Win Some Gems

Attendees can win a limited-edition hard copy of Ray Tracing Gems II, the follow up to 2019’s Ray Tracing Gems.

Ray Tracing Gems II brings the community of rendering experts back together to share their knowledge. The book covers everything in ray tracing and rendering, from basic concepts geared toward beginners to full ray tracing deployment in shipping AAA games.

Learn more about the sweepstakes and enter for a chance to win.

Register for GDC today and join us to get the latest on NVIDIA technology in the gaming industry.

Categories
Misc

GFN Thursday Goes Full Steam Ahead: Over 700 Steam Summer Sale Games Streaming on GeForce NOW

Wake up, wake up, wake up, it’s the first of the month. This first of the month is a GFN Thursday celebration. The Steam Summer Sale now has over 700 PC games on sale that are playable on GeForce NOW. And since it’s also the first GFN Thursday of the month, it’s time to check Read article >

The post GFN Thursday Goes Full Steam Ahead: Over 700 Steam Summer Sale Games Streaming on GeForce NOW appeared first on The Official NVIDIA Blog.

Categories
Misc

Issues when running test for Tensorflow object detection api with Docker

After pulling the Tensorflow object detection api image and run the container, I then try to run the api package test but I keeps failing and returning this error message “Traceback (most recent call last): File “objectdetection/builders/model_builder_tf2_test.py”, line 21, in <module> import tensorflow.compat.v1 as tf File “/home/tensorflow/.local/lib/python3.6/site-packages/tensorflow/init_.py”, line 444, in <module> _ll.load_library(_main_dir) File “/home/tensorflow/.local/lib/python3.6/site-packages/tensorflow/python/framework/load_library.py”, line 154, in load_library py_tf.TF_LoadLibrary(lib) tensorflow.python.framework.errors_impl.NotFoundError: /usr/local/lib/python3.6/dist-packages/tensorflow/core/kernels/libtfkernel_sobol_op.so: undefined symbol: _ZN10tensorflow8OpKernel11TraceStringEPNS_15OpKernelContextEb”

submitted by /u/Tob_iee
[visit reddit] [comments]

Categories
Misc

Continuously Improving Recommender Systems for Competitive Advantage Using NVIDIA Merlin and MLOps

Recommendation systems must constantly evolve through the digestion of new data or algorithmic improvements of the model for its recommendations to stay effective and relevant. In this post, we focus on how NVIDIA Merlin components fit into a complete MLOps pipeline to operationalize a recommendation system, and continuously deliver improvements in production

Recommender systems are a critical resource for enterprises that are relentlessly striving to improve customer engagement. They work by suggesting potentially relevant products and services amongst an overwhelmingly large and ever-increasing number of offerings. NVIDIA Merlin is an application framework that accelerates all phases of recommender system development on NVIDIA GPUs, from experimentation (data processing, data loading, and model training) to production deployment either on-premises or in-cloud.

The term recommender systems implies that they are not just a mere model but an entire pipeline. It is important that all pieces work together like a well-oiled machine. More importantly, these are dynamic systems that need to constantly evolve and adapt (through digestion of new data or algorithmic improvement of the model). The ability to quickly and continuously integrate and deliver these improvements into production is critical for the recommendation system to stay effective.

According to Google Cloud, MLOps is an ML engineering culture and practice that aims at unifying ML system development (Dev) and ML system operation (Ops). MLOps takes both its name as well as some of the core principles and tooling from DevOps. This makes sense as the goals of MLOps and DevOps are practically the same: to reduce the time and effort required to develop, deploy, and maintain high-quality ML software in production.

In this post, we focus on how Merlin components fit into a complete MLOps pipeline and demonstrate with a hands-on example deployed with KubeFlow Pipelines on Google Kubernetes Engine (GKE). When we use the term Merlin MLOps in this post, we mean the act of operationalizing Merlin with MLOps tools and practices.

Reference architecture: MLOps for Merlin 

Here’s a quick review of the Merlin components, as well as different levels of MLOps. The Merlin application framework supports all phases of recommender system development on the GPUs.

  • Data preprocessing and feature engineering: Merlin NVTabular is a high-performance library designed for processing terabyte-scale tabular datasets. It scales seamlessly from single to multi-GPU systems. 
  • Model training: Merlin HugeCTR is a recommender system framework for training state-of-the-art deep learning recommendation models such as DLRM, Wide and Deep, Deep Cross Network (DCN), and so on. It scales seamlessly on multiple GPUs and multi-GPU nodes.
  • Production inference: The NVIDIA Triton Inference Server coupled with a HugeCTR inference backend provides a robust high-throughput and low-latency production environment. NVIDIA Triton can be deployed either on-premises or in-cloud, and it is fully compatible with the Kubernetes ecosystem.

Given the capabilities of Merlin, we now review the three levels of MLOps according to Google Cloud’s definition

  • Level 0: Manual process and pipeline.
  • Level 1: Pipeline with some automation, such as monitoring and triggers, automated retraining, and redeployment of ML models (continuous retraining).
  • Level 2: Fully automated pipeline with continuous integration and delivery (CI/CD).
Figure shows a high level overview of an MLOps pipeline for a recommender system built with NVIDIA Merlin components. It includes all the components from Data acquisition & validation, data preparation, training, model validation, deployment, Monitoring, logging and pipeline triggers.
Figure 1. A high-level overview of Merlin MLOps.

Figure 1 shows a Level 1 Merlin MLOps workflow, with a fully automated pipeline and continuous retraining. Look deeper into this architecture:

  • Data pipeline: Every recommender system starts with data about users, items, and their interactions. Data is collected and stored in a data lake. From the data lake, a subset of data (based on time range and number of features) is extracted and prepared for model training (preprocessing, feature engineering).  A data validation module ensures that the test data is as expected while also detecting data drift.
  • Continuous re-training: At first, the recommendation model is trained on a large amount of available data and deployed. Continuous incremental retraining ensures that the model stays up-to-date and captures the latest trends and user preferences. A model validation module ensures that the model meets a specified quality threshold. 
  • Deployment and serving: An automated redeployment pipeline puts the new qualified model into production in a seamless manner. The number of GPU inference servers automatically scales up and down as needed.
  • Logging and monitoring: Monitoring modules continuously monitor the quality of the recommendation in real-time through a range of KPIs, such as hit rate and conversion rate. The modules trigger full retraining should model drift happen, that is, if certain KPIs fall below known established baselines.

Merlin MLOps with Kubeflow Pipelines on Google Kubernetes Engine

In this section, we walk through a concrete example of realizing the workflow with Kubeflow pipelines and GKE.

GKE provides a managed environment for deploying, managing, and scaling containerized applications using Google Cloud infrastructure. Kubeflow Pipelines is a platform for building and deploying portable, scalable machine learning (ML) workflows based on Docker containers. With an existing GKE cluster, Kubeflow pipelines can be installed easily with a push of a button. We selected Kubeflow Pipelines as the orchestrator that wields together the components of a Merlin MLOps pipeline.

In the Kubernetes world, applications are containerized. Merlin Docker containers are available on NGC, including the Merlin training and inference containers. These containers can be pulled, and then pushed to Google Cloud Container Registry, ready to be used with GKE.

Figure shows a reference architecture of a recommender system MLOps pipeline built with NVIDIA Merlin to accelerate all phases of recommender system development on GPUs. It uses Kubeflow to orchestrate the pipeline components on Google Kubernetes Engine (GKE).
Figure 2. Merlin Kubeflow pipelines architecture on GCP and GKE.

In Figure 2, we mapped the conceptual workflow components in Figure 1 to concrete GCP and GKE components:

  • Data pipeline: Data is collected and stored in a data store, which in this case is a Google Cloud Storage (GCS) bucket. A data extraction module extracts and copies the relevant data to a high-speed active working space. In this example, it is a GKE-persistent volume for preprocessing and model training. A data validation module based on TensorFlow Data Validation analyzes the training data to detect data drift.
  • Continuous re-training: A Merlin training pod is used for data preprocessing and model training.
    • NVTabular is responsible for data preprocessing, feature engineering, and persisting the preprocessed dataset into the pipeline-shared persistent volume.
    • Next, HugeCTR picks up the preprocessed data and trains a DCN model. The model can be updated either using incremental data or trained from scratch using all or a large amount of available data. 
  • Deployment and serving: The deployment module prepares the HugeCTR trained model for production. Prepared models are then stored in a model store in GCS. Depending on the application domains, model serving can involve two steps:
    • Candidate generation reduces the number of candidates from a space potentially as large as millions of items to a computationally manageable amount, for example, thousands of items.
    • The Merlin inference pod picks up and serves the latest HugeCTR trained model from the model store. This inference container contains the Triton Inference Server with a HugeCTR inference backend. The model re-ranks the generated candidates and serves the top scoring ones.
  • Logging and monitoring: The monitoring pod continuously monitors the quality of the recommendation in real-time (hit rate, conversion rate) and automatically triggers full retraining upon detecting significant model drift. NVIDIA Triton and the monitoring module log statistics into Prometheus and Grafana.

Criteo Terabyte click log dataset case study

In this example, we demonstrate the Merlin MLOps pipeline on Kubeflow pipelines and GKE using the Criteo Terabyte click log dataset, which is one of the largest public datasets in the recommendation domain. It contains ~1.3 TB of uncompressed click logs containing over four billion samples spanning 24 days, and can be used to train recommender system models that predict the ad clickthrough rate. Features are anonymized and categorical values are hashed to ensure privacy. Each record in this dataset contains 40 values:

  • A label indicating a click (value 1) or no click (value 0)
  • 13 values for numerical features
  • 26 values for categorical features

Because this data set contains only interaction data and no data on users, items, and their attributes, we skipped the candidate generation and final ranking parts and only implemented the deep learning scoring model to predict whether users will click on the ad.

Technical highlights

In this section, we discuss some of the major highlights pertaining to our implementation.

Multi-instance GPU on GKE

To maximize GPU usage, NVIDIA Triton is deployed on a GKE A100 MIG instance. NVIDIA Multi-instance GPU (MIG) technology partitions a single NVIDIA A100 GPU into as many as seven independent GPU instances. They run simultaneously, each with its own memory, cache, and streaming multiprocessors. That enables the A100 GPU to deliver guaranteed quality-of-service (QoS) at up to 7x higher utilization compared to prior GPUs. Small recommendation models that fit into the memory of a MIG instance can be deployed onto a GKE MIG instance of the appropriate size. That being said, we are working on relaxing this memory requirement through embedding table caching. Stay tuned!

GPU autoscaling

NVIDIA Triton deployment can be scaled using default metrics like CPU/GPU utilization, memory usage, and so on, and also using custom metrics. For this example, we use a custom metric exported to the Prometheus operator based on the average time spent by the incoming request in the inference queue. If the inference load on NVIDIA Triton increases, then the time spent by the incoming requests in the inference queue goes up as well.

To balance the increase in load, the Horizontal Pod Autoscaler (HPA) can schedule another NVIDIA Triton Pod on freely available GPU nodes. If no nodes are available in the GPU node pool, then the HPA kicks in the GKE node autoscaler that assigns a new GPU node to the GPU node pool. After a new node is available in the cluster, the Kubernetes Pod scheduler schedules a new instance of the NVIDIA Triton Pod on that GPU node. The load balancer can then route the pending incoming requests in the queue to the newly created NVIDIA Triton Pod. Subsequently, if the load decreases, the autoscaler can scale down the nodes.

Sending inference requests

An end user interacts with the inference server indirectly through a client application or recommendation API, which translates user requests and responses to inference requests. To this end, we include a test inference client app that can be used to read Criteo .parquet files and send inference gRPC requests to the NVIDIA Triton endpoint.

Monitoring

In an ML system, the relationship between the independent and the target variables can change over time. As a result, the model predictions can gradually become erroneous. In this example pipeline, we have a monitoring module that is tasked with tracking the performance (in this case, AUC score) and triggering another run of the pipeline if AUC drifts below a certain threshold. The monitoring module runs as a separate pod in the GKE cluster.

How does it get access to the request data? In the reference design, the test inference client is responsible for logging the inference requests using Cloud Pub/Sub, where the inference client publishes the requests and corresponding inference results to the Pub/Sub broker, and the monitoring module subscribes to it. Using this asynchronous mechanism, monitoring can assess the performance and take appropriate action like triggering the Kubeflow pipeline for retraining if required. It also writes these requests periodically to a volume, which a daemon job pushes to the GCS bucket for use in the next round of continuous training. This data collection closes the loop in the system, and allows the new incoming requests  as fresh data that the pipeline can use for incremental training from the previous checkpoint.

Scope for improvement

The high-level goal of this post was to show an example of a recommender system, built using Merlin components, running in the form of a Kubeflow pipeline. There are several pieces of this example that could be designed in an alternative way or further improved. For instance:

  • Cloud Pub/Sub is used for communicating request data from the inference client to the monitoring module. This gives you high scalability, reliability, and advantages of asynchronous behavior. However, this does add an additional dependency on GCP infrastructure. Alternatively, you could use other message queues, like Kafka.
  • Data drift could be monitored live, especially in cases where there is no user feedback for served recommendations to estimate model performance. You could plug in a solution similar to the data validation component in monitoring. Additionally, you should first filter outliers out from out-of-distribution samples.
  • The data validation component using TensorFlow Data Validation is a simple example showing where such a component could be plugged into the pipeline. There could be other appropriate actions on detecting drift, like notifications to users or taking corrective measures other than logging. There may be other libraries more suitable to your use case, like Great Expectations or Alibi Detect.

Conclusion

This example with Merlin components on a Kubeflow pipeline follows the reference architecture as described earlier. Most ML systems would follow a similar architecture, with components for data acquisition and cleaning, preprocessing, training, model serving, monitoring, and so on. As such, these blocks could be replaced with custom containers and code in the pipeline. Any additional modules could be either added to the pipeline itself (like data validation and training), or deployed as a separate pod in the cluster (like inference, and monitoring). This Merlin MLOps example serves as a reference on how you can create, compile, and run your pipelines.

The code and step-by-step instructions to run this Merlin MLOps example are available at the NVIDIA-Merlin/gcp-ml-ops GitHub repo. We’d love to hear about how this project relates to what you’re working on, especially if you have any questions or feedback! You can reach us through the repo or by leaving a comment here.

Categories
Misc

OSError: SavedModel file does not exist

Can anyone help me fix the error OSError: SavedModel file does not exist at: /mnt/Archive/Google_T5/11B/{saved_model.pbtxt|saved_model.pb} in this code:

import tensorflow.compat.v1 as tf import tensorflow_text as tf_text tf.reset_default_graph() sess = tf.Session() meta_graph_def = tf.saved_model.loader.load(sess, ["serve"], "/mnt/Archive/Google_T5/11B") signature_def = meta_graph_def.signature_def["serving_default"] 

submitted by /u/notooth1
[visit reddit] [comments]