Categories
Misc

Evolving from Network Simulation to Data Center Digital Twin

Digital twins are attracting increasing attention across industries. While the concept is relatively new to many, digital twins are not new to IT, which has for…

Digital twins are attracting increasing attention across industries. While the concept is relatively new to many, digital twins are not new to IT, which has for some time recognized the benefits. One such benefit is the value of simulating a network environment. Network operators have been chasing network simulators for years.

Cisco’s Packet Tracer was an early industry network simulator that was quite popular. This simple tool provided a first exposure to network simulation for countless classically trained network admins. Packet Tracer only offered the capability to simulate a handful of generic network devices with a limited list of supported features. Even then, it was easy to see the value network simulation offered to operators.

The rise of data center infrastructure simulation

The capabilities of network simulators have grown immensely over the years, aided by the move to the cloud. Many infrastructure appliances were re-envisioned as cloud-native offerings and run as VMs and containers in the public cloud. They were also ideally suited for data center infrastructure simulation. 

Armed with a plethora of new simulated device images, the value that can be extracted from simulations has increased. What started as network simulation has grown into a new category of holistic data center infrastructure simulation. This increasingly complex environment is also increasingly driven by automation. The adoption of automation is another key driver for the use of network simulation.

Business leaders are realizing that critical business applications sit directly on top of these brittle sets of interacting software systems, and the value of simulation is becoming prominent to business productivity.

The value of data center simulation

The benefits of data center simulation are evident in the data center lifecycle, from planning to building and maintaining (Figure 1).

Diagram of the deployment lifecycle including Day 0, Day 1, and Day 2 and the various use cases for a digital twin at each deployment phase.
Figure 1. The data center deployment lifecycle and various use cases for a digital twin at each phase 

Planning

On Day 0 of the data center deployment lifecycle, you can get ahead of supply chain challenges and model your environment while your hardware is on order. In the time which would normally be spent waiting for equipment to arrive, you can accomplish many preliminary tasks, including:

  • Define cabling architecture
  • Create initial configurations
  • Automate and deploy the entire virtual data center

At this stage, simulation can help you build confidence that your solution is going to work as intended and model the interaction surfaces between your different software systems. For example, you can model your DCIM, your automation platform and the applications themselves, as well as other tools. 

You can also get ahead of multi-vendor issues by verifying interoperability in your virtual Proof of Concept (vPoC). You can train staff on your new solution and build familiarity with your specific deployment even before the first devices have arrived on the loading dock. To learn more, see Close Knowledge Gaps and Elevate Training with Digital Twin NVIDIA Air.

Building 

On Day 1 of deployment, you directly benefit from the lessons learned from using simulation during planning. The resulting configuration, automation and topology information generated in the digital twin can be leveraged to accelerate the deployment of your physical data center. For larger deployments, you can use technology such as Prescriptive Topology Manager to validate the cable plans in physical deployments against the topology built ahead of time in your digital twin to spot cabling issues. 

In environments with many thousands of cables, just answering the question “is it plugged in” can be a monumental task. With the digital twin, cable validation for the whole data center can be done in seconds. Aside from layer one, the digital twin can be used as a reference for the physical deployment to verify the initial state of the control plane in layers one through four.

Maintaining

With the data center deployed, the operational phase of the lifecycle begins. During this phase, you can use the digital twin to model the changes in the environment prior to deployment. You can attach the digital twin to your CI/CD pipeline to automatically validate any configuration or topology change prior to deployment and any resulting impact on connectivity of your applications. 

Additional benefits of the digital twin during this phase are focused around operators, specifically the ability to troubleshoot the virtual environment in ways which would never be allowed in production. This kind of deep troubleshooting and chaos engineering can get ahead of numerous problems which may be hiding in your architecture. It can also rapidly accelerate the onboarding of new personnel giving them a risk-free learning environment that identically matches production.

Data center digital twins

With current simulation technology, it is now possible to simulate thousands of routers, switches, and data center infrastructure devices fully loaded with configurations. 

The most important aspects of the future of data center digital twins include:

  • Connecting the digital twin with the physical twin and synchronizing the two
  • Increasing the accuracy of simulation such that all relevant behaviors of the data center and network can be simulated completely 

Increased accuracy and synchronization are what separate network simulation from a true digital twin. And every IT administrator should be striving to achieve a true digital twin.

Check out NVIDIA Air to start building your own data center digital twin. Enhance your data center operations with the open network operating system NVIDIA Cumulus Linux.

Categories
Offsites

Have you seen more math videos in your feed recently? (SoME2 results)

Categories
Misc

Detecting Objects in Point Clouds Using ROS 2 and TAO-PointPillars

Accurate, fast object detection is an important task in robotic navigation and collision avoidance. Autonomous agents need a clear map of their surroundings to…

Accurate, fast object detection is an important task in robotic navigation and collision avoidance. Autonomous agents need a clear map of their surroundings to navigate to their destination while avoiding collisions. For example, in warehouses that use autonomous mobile robots (AMRs) to transport objects, avoiding hazardous machines that could potentially damage robots has become a challenging problem.

This post presents a ROS 2 node for detecting objects in point clouds using a pretrained model from NVIDIA TAO Toolkit based on PointPillars. The node takes point clouds as input from real or simulated lidar scans, performs TensorRT-optimized inference to detect objects in this input data, and outputs the resulting 3D bounding boxes as a Detection3DArray message for each point cloud. 

While multiple ROS nodes exist for object detection from images, the advantages of performing object detection from lidar input include the following:

  • Lidar can calculate accurate distances to many detected objects simultaneously. With object distance and direction information provided directly from lidar, it’s possible to get an accurate 3D map of the environment. To obtain the same information in camera/image-based systems, a separate distance estimation process is required which demands more compute power.
  • Lidar is not sensitive to changing lighting conditions (including shadows and bright light), unlike cameras.

An autonomous system can be made more robust by using a combination of lidar and cameras. This is because cameras can perform tasks that lidar cannot, such as detecting text on a sign. 

TAO-PointPillars is based on work presented in the paper, PointPillars: Fast Encoders for Object Detection from Point Clouds, which describes an encoder to learn features from point clouds organized in vertical columns (or pillars). TAO-PointPillars uses both the encoded features as well as the downstream detection network described in the paper.

For our work, a PointPillar model was trained on a point cloud dataset collected by a solid state lidar from Zvision. The PointPillar model detects objects of three classes: Vehicle, Pedestrian, and Cyclist. You can train your own detection model following the TAO Toolkit 3D Object Detection steps, and use it with this node.

For details on running the node, visit NVIDIA-AI-IOT/ros2_tao_pointpillars on GitHub. You can also check out NVIDIA Isaac ROS for more hardware-accelerated ROS 2 packages provided by NVIDIA for various perception tasks. 

Block diagram of the ROS 2 TAO-PointPillars node with names of ROS 2 topics subscribed to and published by the node.
Figure 1. ROS 2 TAO-PointPillars node

ROS 2 TAO-PointPillars node 

This section provides more details about using the ROS 2 TAO-PointPillars node with your robotic application, including the input/output formats and how to visualize results. 

Node Input: The node takes point clouds as input in the PointCloud2 message format. Among other information, point clouds must contain four features for each point (x, y, z, r) where (x, y, z, r) represent the X coordinate, Y coordinate, Z coordinate and reflectance (intensity), respectively.

Reflectance represents the fraction of a laser beam reflected back at some point in 3D space. Note that the range for reflectance values should be the same in the training data and inference data. Parameters including intensity range, class names, NMS IOU threshold can be set from the launch file of the node.

You can find ROS 2 bags for testing the node by visiting ZVISION-lidar/zvision_ugv_data on GitHub.

An example of the Zvision camera point of view (left), corresponding point cloud from the lidar (center), and the result after inference using TAO-PointPillars (right). The people and truck are detected correctly.
Figure 2. An example of the Zvision camera point of view (left), corresponding point cloud from the lidar (center), and the result after inference using TAO-PointPillars (right). The people and truck are detected correctly.

Node Output: The node outputs 3D bounding box information, object class ID, and score for each object detected in a point cloud in the Detection3DArray message format. Each 3D bounding box is represented by (x, y, z, dx, dy, dz, yaw) where (x, y, z, dx, dy, dz, yaw) are, respectively, the X coordinate of object center, Y coordinate of object center, Z coordinate of object center, length (in X direction), width (in Y direction), height (in Z direction) and orientation in 3D Euclidean space. 

The coordinate system used by the model during training and that used by the input data during inference must be the same for meaningful results. Figure 3 shows the coordinate system used by the TAO-PointPillars model.

The coordinate system used by TAO-PointPillars. Origin is the center of lidar. X axis is to the front, Y axis is to the left and Z axis is upwards. Yaw is the rotation in the X-Y plane, in counter-clockwise direction. So X axis corresponds to yaw = 0 and Y axis corresponds to yaw = pi / 2.
Figure 3. The coordinate system used by the TAO-PointPillars model

Since Detection3DArray messages cannot currently be visualized on RViz, you can find a simple tool to visualize results by visiting NVIDIA-AI-IOT/viz_3Dbbox_ros2_pointpillars on GitHub.

For the example shown in Figure 4 below, the frequency of input point clouds is ~10 FPS and of output Detection3DArray messages is ~10 FPS on Jetson AGX Orin.

A GIF compilation of three images: The Zvision camera point of view; the point cloud from the Zvision lidar; and the detection results using TAO-PointPillars.
Figure 4. Counterclockwise from top left: An image from the Zvision camera point of view; the point cloud from the Zvision lidar; and the detection results using TAO-PointPillars

Summary

Accurate object detection in real time is necessary for an autonomous agent to navigate its environment safely. This post showcases a ROS 2 node that can detect objects in point clouds using a pretrained TAO-PointPillars model. (Note that the TensorRT engine for the model currently only supports a batch size of one.) This model performs inference directly on lidar input, which maintains advantages over using image-based methods. For performing inference on lidar data, a model trained on data from the same lidar must be used. There will be a significant drop in accuracy otherwise, unless a method like statistical normalization is implemented.

Categories
Misc

Google Colab’s ‘Pay As You Go’ Offers More Access to Powerful NVIDIA Compute for Machine Learning

Colabs’s new Pay as You Go option helps you accomplish more with machine learning.Access additional time on NVIDIA GPUs with the ability to upgrade to NVIDIA…

Colabs’s new Pay as You Go option helps you accomplish more with machine learning.
Access additional time on NVIDIA GPUs with the ability to upgrade to NVIDIA A100 Tensor Core GPUs when you need more power for your ML project.

Categories
Misc

The Wheel Deal: ‘Racer RTX’ Demo Revs to Photorealistic Life, Built on NVIDIA Omniverse

NVIDIA artists ran their engines at full throttle for the stunning Racer RTX demo, which debuted at last week’s GTC keynote, showcasing the power of NVIDIA Omniverse and the new GeForce RTX 4090 GPU. “Our goal was to create something that had never been done before,” said Gabriele Leone, creative director at NVIDIA, who led Read article >

The post The Wheel Deal: ‘Racer RTX’ Demo Revs to Photorealistic Life, Built on NVIDIA Omniverse appeared first on NVIDIA Blog.

Categories
Misc

All This and Mor-a Are Yours With Exclusive ‘Genshin Impact’ GeForce NOW Membership Reward

It’s good to be a GeForce NOW member. Genshin Impact’s new Version 3.1 update launches this GFN Thursday, just in time for the game’s second anniversary. Even better: GeForce NOW members can get an exclusive starter pack reward, perfect for their first steps in HoYoverse’s open-world adventure, action role-playing game. And don’t forget the nine Read article >

The post All This and Mor-a Are Yours With Exclusive ‘Genshin Impact’ GeForce NOW Membership Reward appeared first on NVIDIA Blog.

Categories
Misc

Explainer: What is Zero Trust?

Zero trust is a cybersecurity strategy for verifying every user, device, application, and transaction in the belief that no user or process should be trusted.

Zero trust is a cybersecurity strategy for verifying every user, device, application, and transaction in the belief that no user or process should be trusted.

Categories
Misc

AI Model Matches Radiologists’ Accuracy Identifying Breast Cancer in MRIs

Researchers from NYU Langone Health aim to improve breast cancer diagnostics with a new AI model. Recently published in Science Translational Medicine, the…

Researchers from NYU Langone Health aim to improve breast cancer diagnostics with a new AI model. Recently published in Science Translational Medicine, the study outlines a deep learning framework that predicts breast cancer from MRIs as accurately as board-certified radiologists. The research could help create a foundational framework for implementing AI-based cancer diagnostic models in clinical settings.

“Breast MRI exams are difficult and time-consuming to interpret, even for experienced radiologists. AI has tremendous potential in improving medical diagnosis, as it can learn from tens of thousands of exams. Using AI to assist radiologists can make the process more accurate, and provide a higher level of confidence in the results,” said study senior author Krzysztof J. Geras, an assistant professor in the Department of Radiology at the NYU Grossman School of Medicine.

As a sensitive tool in breast cancer diagnostics, MRIs can help identify malignant lesions sometimes missed in mammograms and clinical applications.

Dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) is often used as a screening tool in high-risk patients and its applications are expanding. The tool can help doctors investigate potentially suspicious lesions or evaluate the extent of disease in newly diagnosed patients. This type of medical imaging can also help inform a doctor’s treatment plan, including whether to perform a biopsy or the extent of surgery needed. Both scenarios influence short- and long-term patient outcomes.

According to the researchers, MRIs have untapped potential in predicting disease pathology and achieving a better understanding of tumor biology. Developing AI models from large, well-annotated datasets could be key in refining the sensitivity of these scans and reducing unnecessary biopsies. 

The researchers produced an AI model that improves the accuracy of breast cancer diagnosis using a dataset extracted from clinical exams performed at a NYU Langone Health breast imaging site. The dataset consisted of bilateral DCE-MRI studies from patients categorized as high-risk screening, preoperative planning, routine surveillance, or follow-up after a suspicious finding.

They trained an ensemble of deep neural networks with 3D convolutions for detecting spatiotemporal features using 14,198 labeled MRI examinations with mixed precision on the NVIDIA Apex open-source library. According to the team, the use of this library made it possible to increase the batch size during training.

The networks were trained using the cuDNN-accelerated PyTorch framework on the university’s HPC cluster equipped with 136 NVIDIA V100 GPUs and NVIDIA NVLink for scaling memory and performance.

Multi-GPU training was enhanced by the NVIDIA Collective Communication Library and training a single model took 12 days on average.

“The GPUs did all the heavy lifting for training with high-dimensional data as we used full MRI volumes,” said Geras. 

The model performance was validated on a total of 3,936 MRIs from NYU Langone Health. Using three additional datasets from Duke University, Jagiellonian University Hospital in Poland, and the Cancer Genome Atlas Breast Invasive Carcinoma data collection, the team validated that the model can work across different populations and data sources.  

A framework outlining the steps taken in the study from data collection to personalizing management.
Figure 1. Overview of the study that trains and evaluates an AI system based on deep neural networks to predict the probability of breast cancer in DCE-MRI studies

The researchers compared the model results against five board-certified breast radiology attendings with 2 to 12 years of experience interpreting breast MRI exams. The clinicians interpreted 100 randomly selected MRI studies from the NYU Langone Health data. 

The team found no statistical significance in the results between the radiologists and the AI system. Averaging the AI and radiologist predictions together increased overall accuracy by at least 5%, suggesting a hybrid approach may be most beneficial.

The model was also as accurate among patients with various subtypes of cancer, even among less common malignancies. Patient demographics, such as age and race, did not influence the AI system, despite the little training data available for some groups.  

The model output can also be combined with the personal preference of a clinician or patient deciding whether to pursue a biopsy after a suspicious finding. By default, all suspicious lesions classified with category BI-RADS 4 are recommended for biopsy, which leads to a substantial number of false positives. The AI model predictions can help avoid benign biopsies in up to 20% of all BI-RADS category 4 patients.

The authors note that the work has a few limitations, including understanding how a hybrid approach could impact a radiologist’s decision in a hospital setting or how the model makes its predictions.

“Although we only looked at retrospective data in our study, the results are strong enough that we are confident in the accuracy of the model. We look forward to further translating this work into clinical practice, deploying our AI systems in real life, and improving breast cancer diagnostics for both doctors and patients,” said study lead author Jan Witowski, a postdoctoral research fellow at NYU School of Medicine.

Read the study Improving breast cancer diagnostics with artificial intelligence for MRI in Science Translational Medicine. >

Categories
Misc

Powering NVIDIA-Certified Enterprise Systems with Arm CPUs

Organizations are rapidly becoming more advanced in the use of AI, and many are looking to leverage the latest technologies to maximize workload performance and…

Organizations are rapidly becoming more advanced in the use of AI, and many are looking to leverage the latest technologies to maximize workload performance and efficiency. One of the most prevalent trends today is the use of CPUs based on Arm architecture to build data center servers. 

To ensure that these new systems are enterprise-ready and optimally configured, NVIDIA has approved the first NVIDIA-Certified Systems with Arm CPUs and NVIDIA GPUs. This post presents the benefits of NVIDIA-Certified Arm systems, and what customers should expect to see in the near future.

Using Arm architecture for HPC

Arm-based systems are common for edge applications. They are already widely used by large-scale cloud service providers, and are starting to become more popular for data center applications. According to Gartner®, 12% of new servers for high-performance computing (HPC) will be Arm-based by 2025.1 

Systems based on Arm architecture have the ability to run many cores with high energy efficiency, along with high memory bandwidth and low latency. In fact, recent results for the MLPerf benchmarks show Arm systems delivering almost the same performance for inference as x86-based systems, with one test showing the Arm-based server outperforming a similar x86 system.

The certification by NVIDIA of Arm-based systems is the culmination of a process that started in 2019, when NVIDIA ported the CUDA-X libraries to Arm. This paved the way for NVIDIA partners to start building energy-efficient, AI-enabled systems. NVIDIA also partnered with GIGABYTE in 2021 to develop and offer the Arm HPC Developer Kit

Now, NVIDIA Certification will help businesses choose the best enterprise-grade systems.

NVIDIA-Certified Arm systems

NVIDIA-Certified Systems offer NVIDIA GPUs and NVIDIA high-speed, secure network adapters from leading NVIDIA partners in configurations validated for optimum performance, manageability, and scale. Announced at the beginning of 2021, the program gives customers and partners confidence to choose enterprise-grade hardware solutions to power their accelerated computing workloads—from the desktop to the data center and edge.

More than 200 certified systems are now available—covering data center, desktop, and edge—from over 30 partners. NVIDIA-Certified Systems have excellent performance on a range of modern accelerated computing workloads, including AI and data science, 3D computing and visualization, and HPC. 

The certification also validates key enterprise capabilities, including management, security, and scalability. This ensures that certified systems can take advantage of powerful software including: 

GIGABYTE: The first Arm-ready certified system

The first NVIDIA-Certified Arm system is the GIGABYTE G242-P33, which features the Neoverse-based Ampere Altra processor and up to four NVIDIA A100 Tensor Core GPUs. GIGABYTE has been part of the NVIDIA-Certified Systems program since its inception, and now offers more than 15 NVIDIA-Certified Systems. 

“Qualifying Arm-based servers for NVIDIA accelerators continues to be one of GIGABYTE’s top priorities, and with NVIDIA-Certified Systems we will take the performance validation a step further to not only support the new NVIDIA H100 but also to include NVIDIA BlueField-2 DPU and InfiniBand products,” said Etay Lee, CEO of GIGABYTE. 

“Customers want an Arm-ready solution that comes with a wealth of NVIDIA resources and support to achieve faster insights,” Lee added. “That is what our Ampere Altra servers have delivered, starting with our server for the NVIDIA Arm HPC Developer Kit.”

As the Arm architecture becomes more adopted in data centers, it will be important to choose systems that are optimally configured. This is particularly the case for Arm systems equipped with GPUs and high-speed networking, since this architecture is new to many enterprises. 

Customers might not have expertise to design such a system properly, but NVIDIA-Certified Systems provide them with an easy way to make the best choices. To find Arm-based certified systems, see the Qualified Systems Catalog. The catalog will grow as more systems are certified.

1Gartner, “Forecast Analysis: Arm-Based Servers, Worldwide,” G00755363, November 2021.

GARTNER is a registered trademark and service mark of Gartner, Inc. and/or its affiliates in the U.S. and internationally and is used herein with permission. All rights reserved.

Categories
Misc

Finding Out Where Your Application and Network Intersect

Modern data centers can run thousands of services and applications. When an issue occurs, as a network administrator, you are guilty by default. You have to…

Modern data centers can run thousands of services and applications. When an issue occurs, as a network administrator, you are guilty by default. You have to prove your innocence on a daily basis, as it is easy to blame the network. It is an unfair world.

Correlating application performance issues to the network is hard to do. You can start by checking basic connectivity using simple pings or traceroutes, check your SNMP-based monitoring tools, sniffers, or even reading device counters to look for drops. In the meantime, users suffer from application slowness, poor performance, or even unavailability.

Unfortunately, all these classic network troubleshooting methods are time-consuming and don’t guarantee success, as it is sometimes nearly impossible to pinpoint problems using them.

NetQ to the rescue

To facilitate network troubleshooting, NVIDIA developed NetQ—a scalable, modern network operations toolset that provides network visibility in real time.

The NetQ team recently introduced the unique flow analysis tool to provide further visibility enhancements. Flow analysis allows network administrators to instantly correlate service traffic flows to the paths taken in the fabric, dramatically reducing the mean time to innocence (MTTI) or even ensuring there is no network issue.

Flow analysis enables you to discover and visualize all paths that a specific application’s traffic flow takes between endpoints in the fabric. It monitors the fabric-wide latency and buffer utilization statistics. With EVPN and multi-tenancy becoming the standard solution in most modern data centers, the flow analysis tool was designed to sample TCP or UDP data on overlay and underlay networks within different VRFs.

Flow analysis becomes even more powerful when used with What Just Happened (WJH) ASIC telemetry. While flows are being analyzed, flow-related WJH events from all switches in traffic paths are presented to help you discover if there were drops that caused the service issue. These two features working together maximize the probability of pinpointing the actual problem affecting an application.   

Screen shot of the dashboard showing latency results and a flow graph.
Figure 1. NetQ flow analysis dashboard

By the numbers

Flow analysis is supported on NVIDIA Spectrum 2 and later switches running Cumulus Linux 5.0 or later. It can also provide partial-path discovery for brownfield deployments with unsupported switches or switches running older versions of Cumulus Linux or SONiC.

Flow analysis samples traffic based on the packet’s four or five tuples, including VXLAN inner and outer headers. Its sampling lifetime is limited to 10, 15, 20, or 30 minutes. You can decide whether to run it on creation or schedule it for a later time.

The sample rate granularity is also configurable to low (1 per 10000), medium (1 per 1000), high (1 per 100), or all packets (1 per 1). The higher the sampling rate, the more accurate your analyzed data. A higher sampling rate results in higher CPU utilization, so I recommend setting lower sampling rates for heavy traffic flows.

Try it yourself in NVIDIA Air

NVIDIA Air is a tool for creating data center digital twins. With Air, you can build your own Cumulus Linux virtual data center, test it, validate it with NetQ, explore features, and learn some best practices. It is entirely free to use!

Try out flow analysis by spinning up the prebuilt NVIDIA Air Infrastructure Simulation Platform demo in the Air Marketplace. Follow the guided tour and see the significant benefits that flow analysis with NetQ can bring to your organization.

For more information, see the following resources: