Categories
Misc

New on NGC: SDKs for Large Language Models, Digital Twins, Digital Biology, and More

NVIDIA announces new SDKs available in the NGC catalog, a hub of GPU-optimized deep learning, machine learning, and HPC applications. With highly performant…

NVIDIA announces new SDKs available in the NGC catalog, a hub of GPU-optimized deep learning, machine learning, and HPC applications. With highly performant software containers, pretrained models, industry-specific SDKs, and Jupyter notebooks available, AI developers and data scientists can simplify and reduce complexities in their end-to-end workflows.

This post provides an overview of new and updated services in the NGC catalog, along with the latest advanced SDKs to help you streamline workflows and build solutions faster.

Simplifying access to large language models

Recent advances in large language models (LLMs) have fueled state-of-the-art performance for NLP applications, such as virtual scribes in healthcare, interactive virtual assistants, and many more. 

NVIDIA NeMo Megatron

NVIDIA NeMo Megatron, an end-to-end framework for training and deploying LLMs with up to trillions of parameters, is now available in open beta from the NGC catalog. It consists of an end-to-end workflow for automated distributed data processing; training large-scale customized GPT-3, T5, and multilingual T5 (mT5) models; and deploying models for inference at scale.

NeMo Megatron can be deployed on several cloud platforms, including Microsoft Azure, Amazon Web Services, and Oracle Cloud Infrastructure. It can also be accessed through NVIDIA DGX SuperPODs and NVIDIA DGX Foundry.

Request NeMo Megatron in open beta.

NVIDIA NeMo LLM

The NVIDIA NeMo LLM service provides the fastest path to customize foundation LLMs and deploy them at scale, using the NVIDIA-managed cloud API or through private and public clouds.

NVIDIA and community-built foundation models can be customized using prompt learning capabilities, which are compute-efficient techniques that embed context in user queries to enable greater accuracy in specific use cases. These techniques require just a few hundred samples to achieve high accuracy in building applications. These applications can range from text summarization and paraphrasing to story generation.

This service also provides access to the Megatron 530B model, one of the world’s largest LLMs with 530 billion parameters. Additional model checkpoints include 3B T5 and NVIDIA-trained 5B and 20B GPT-3.

Apply now for NeMo LLM early access.

NVIDIA BioNeMo

The NVIDIA BioNeMo service is a unified cloud environment for end-to-end, AI-based drug discovery workflows, without the need for IT infrastructure.

Today, the BioNeMo service includes two protein models, with models for DNA, RNA, generative chemistry, and other biology and chemistry models coming soon.

ESM-1 is a protein LLM, which was trained on 52 million protein sequences, and can be used to help drug discovery researchers understand protein properties, such as cellular location or solubility, and secondary structures, such as alpha helix or beta sheet.

The second protein model in the BioNeMo service is OpenFold, a PyTorch-based NVIDIA-optimized reproduction of AlphaFold2 that quickly predicts the 3D structure of a protein from its primary amino acid sequence.

With the BioNeMo service, chemists, biologists, and AI drug discovery researchers can generate novel therapeutics and understand the properties and function of proteins and DNA. Ultimately, they can combine many AI models in a connected, large-scale, in silico AI workflow that requires supercomputing scale over multiple GPUs.

BioNeMo will enable end-to-end modular drug discovery to accelerate research and better understand proteins, DNA, and chemicals.

Apply now for BioNeMo early access.

AI frameworks for 3D and digital twin workflows

A digital twin is a virtual representation—a true-to-reality simulation of physics and materials—of a real-world physical asset or system, which is continuously updated. Digital twins aren’t just for inanimate objects and people. They can replicate a fulfillment center process to test out human-robot interactions before activating certain robot functions in live environments and the applications are as wide as the imagination.

NVIDIA Omniverse Replicator

NVIDIA Omniverse Replicator is a highly extensible framework built on the NVIDIA Omniverse platform that enables physically accurate 3D synthetic data generation to accelerate the training and accuracy of perception networks.

Technical artists, software developers, and ML engineers can now easily build custom, physically accurate, synthetic data generation pipelines in the cloud or on-premises with the Omniverse Replicator container available from the NGC catalog.

Download the Omniverse Replicator container for self-service cloud deployment.

NVIDIA Modulus

NVIDIA Modulus is a neural network AI framework that enables you to create customizable training pipelines for digital twins, climate models, and physics-based modeling and simulation.

Modulus is integrated with NVIDIA Omniverse so that you can visualize the outputs of Modulus-trained models.  This interface enables interactive exploration of design variables and parameters for inferring new system behavior and visualizing it in near real time.

The latest release (v22.09), includes key enhancements to increase composition flexibility for neural operator architectures, features to improve training convergence and performance, and most importantly, significant improvements to the user experience and documentation.

Download the latest version of Modulus.

Deep learning software

The most popular deep learning frameworks for training and inference are updated monthly. Pull the latest version (v22.09):

New pretrained models

We are constantly adding state-of-the-art models for a variety of speech and vision models. The following pretrained models are new on NGC:

  • SLU Conformer-Transformer-Large SLURP: Performs joint intent classification and slot filling, directly from audio input.
  • Riva ASR Korean LM:  An automatic speech recognition (ASR) engine that can optionally condition the transcript output on n-gram language models.
  • LangID Ambernet: Used for spoken language identification (LangID or LID) and serves as the first step for ASR.
  • STT En Squeezeformer CTC Small Librispeech: A model for English ASR that is trained with NeMo on the LibriSpeech dataset.
  • TTS De FastPitch HiFi-GAN: This collection contains two models: FastPitch, which was trained on over 23 hours of German speech from one speaker, and HiFi-GAN, which was trained on mel spectrograms produced by the FastPitch model.

Explore more pretrained models for common AI tasks on the NGC Models page.

Categories
Misc

An AIoT Solution for Visual Blockage Detection at Culverts

One of the key contributors in originating flash floods is the blockage of cross-drainage hydraulic structures, such as culverts, by unwanted, flood-borne…

One of the key contributors in originating flash floods is the blockage of cross-drainage hydraulic structures, such as culverts, by unwanted, flood-borne debris being transported.

The accumulation and interaction of debris with culverts often result in reduced hydraulic capacity, diversion of upstream flows, and structural failure. For example, the Newcastle, Australia floods in 2007, Wollongong, Australia floods in 1998 and Pentre, United Kingdom floods in 2021, are just a few instances where blockages were reported as a primary reason for cross-drainage hydraulic structure failure.

In this post, we describe our technique for building a diverse visual dataset for computer vision model training, including examples of synthetic images. We break down each component of our solution and provide insights on future research directions.

Problem

Non-linear debris accumulation, the unavailability of real-time data, and complex hydrodynamics suggested the invalidity of a conventional numerical modeling-based approach to address the problem. In this context, post-flood visual information was used to develop the blockage policies involving several assumptions, which many argue are not a true representative of blockage.

This suggests the need for better understanding and exploring the blockage issue from a technology perspective to aid flood management officials and policymakers.

StopBlock: A technology initiative to monitor the visual blockage of culverts

To help address the blockage problem, StopBlock was initiated as a part of SMART Stormwater Management. Overall, this project involved collaboration between city councils in the Illawarra (Wollongong, Shellharbour, and Kiama) and Shoalhaven regions, Lendlease, and the University of Wollongong’s SMART Infrastructure Facility.

StopBlock aims to assess and monitor the visual blockage at culverts in real time using the latest technologies:

  • Artificial intelligence
  • Computer vision
  • Edge computing
  • Internet of Things (IoT)
  • Intelligent video analytics

In addition, we build and deployed an artificial intelligence of things (AIoT) solution using NVIDIA edge computing, the latest computer vision detection and classification models, a CCTV camera, and a 4G module. The solution detected the visual blockage status (blocked, partially blocked, or clear) at three culvert sites within the Illawarra region.

Building visual datasets for computer vision model training

Training computer vision CNN models requires numerous images related to the intended task. The problem of culvert blockage detection has not been addressed from this perspective before. No database of image data and datasets exists for this purpose.

We developed a new training database consisting of diverse image data related to culvert blockage. These images showed varying culvert types, debris types, camera angles, scaling, and lighting conditions.

Limited data from real culvert blockage was available through the city council records. We adopted the idea of using the combination of real, lab-simulated, and synthetic visual data.

Images of culvert openings and blockage

We collected real images of culverts (blocked and clear) from multiple sources:

  • City council historical records
  • Online repositories
  • Local culvert sites

The collected images represent great diversity in terms of culvert types, debris types, illumination conditions, camera viewpoints, scale, resolution, and even backgrounds. The images of culvert openings and blockages (ICOB) dataset consisted of 929 images in total.

Photos of a few selected culvert samples from the ICOB data set with bounding box annotations.
Figure 1. Samples from the ICOB dataset

Visual hydraulics-lab blockage dataset

We collected simulated images from scaled laboratory experiments to optimize the existing visual dataset, as not enough real images were available.

A thorough hydraulics laboratory investigation was performed where a series of experiments used scaled physical models of culverts. Blockage scenarios used scaled debris (urban and vegetative) under various flooding conditions.

The images represented diversity in terms of culvert types (single circular, double circular, single box, or double box), blockage types (urban, vegetative, or mixed), simulated lighting conditions, camera viewpoints (two cameras), and flooding conditions (inlet discharge levels). However, the dataset was limited in terms of reflections, clear water, identical background, and identical scaling.

In total, we collected 1,630 images from these experiments to establish the VHD dataset.

Photos of culvert samples from the VHD dataset with bounding box annotations.
Figure 2. Samples from the VHD dataset

Synthetic images of culverts

We generated synthetic images of culverts (SIC) using a three-dimensional computer application based on the Unity gaming engine with the goal of enhancing the datasets for training.

The application is specifically designed to simulate culvert blockage scenarios and can generate virtually countless instances of blocked culverts with any possible blockage situation that you can think of. You can also alter culvert types, water levels, debris types, camera viewpoints, time of the day, and scaling.

The app design enables you to select scene features from dropdown menus and drag debris objects from a library to place anywhere in the scene with any possible orientation. You can write code using parameters to recreate multiple scenarios and batch capture the images with corresponding labels, to aid the training process.

Some highlighted limitations included unrealistic effects and animations and a single natural background. Figure 3 shows samples from the SIC dataset.

Photos of a few selected culvert samples from the SIC dataset with bounding box annotations.
Figure 3. Samples from the SIC dataset

AIoT system development

We developed an AIoT solution using edge computing hardware, computer vision models, and sensors for the real-time visual blockage monitoring at culverts:

  • A CCTV camera to capture the culvert.
  • NVIDIA TX2–powered edge compute to process and infer blockage images using trained computer vision models.
  • 4G connectivity to transmit blockage-related data to a web-based dashboard.
  • Computer vision models to detect and classify the visual blockage at culverts.

More specifically, in terms of software, a two-stage detection-classification pipeline is adopted (Figure 4).

Detection stage

In the first stage, a computer vision object detection model (YOLOv4) is used to detect the culvert openings. The detected openings are cropped from the original image and are processed for the classification stage. If no culvert opening is detected, an alert is issued to suggest that the culvert might be submerged.

Classification stage

At the second stage, a CNN classification model such as ResNet-50) is used to classify the cropped culvert openings into one of three blockage classes (blocked, partially blocked, or clear). The blockage-related information is then transmitted to a web dashboard for flood management officials to facilitate the decision-making process.

Flow diagram shows the approach of sequentially detecting culvert visible openings and classifying them as clear or blocked.
Figure 4. A two-stage detection-classification pipeline for visual blockage detection at culverts

We trained the YOLOv4 and ResNet-50 models used for detection and classification, respectively, using the NVIDIA TAO platform powered by Python, TensorFlow, and Keras. We used a Linux machine equipped with the NVIDIA A100 GPU for training the models using images from the ICOB, VHD, and SIC datasets.

Here’s the four-stage approach adopted for development:

  • Stage I: We prepared a dataset from real and simulated images.
  • Stage II: We selected detection and classification models from the NVIDIA TAO model zoo and trained them using the TAO platform.
  • Stage III: We exported trained models to be deployed on the NVIDIA TX2 edge computer.
  • Stage IV: In the field, we deployed a complete hardware system and collected real data for fine-tuning the computer vision algorithms.

Relating to software performance, the culvert opening detection model achieved the validation mAP of 0.90 while the blockage classification model achieved a validation accuracy of 0.88.

We developed an end-to-end video analytics pipeline on the NVIDIA DeepStream 6 SDK, using the trained computer vision models to make the inference on the NVIDIA TX2-powered edge computer. Using these detection and classification models, the DeepStream pipeline achieved the FPS of 24.8 for NVIDIA TX2 hardware.

We built the smart device for culvert blockage monitoring using a CCTV camera, NVIDIA TX2 edge computer, and 4G dongle (Figure 5). We optimized the developed hardware for power consumption and computational time for real-time utility. Powered by a solar panel, the hardware consumes only 9.1W average power. The AIoT solution is also configured to transmit the blockage metadata every hour to the web dashboard.

The solution is configured to consider the privacy issues and avoid storing any images on board or in the cloud. Instead, it only processes the images and transmits the blockage metadata. Figure 5 shows the installation of the AIoT hardware at one of the remote sites to monitor the culvert visual blockage.

Photo of the AIoT hardware setup based on NVIDIA Jetson TX2 and a culvert photo showing hardware deployment on poles.
Figure 5. AIoT hardware setup (left) and field deployment (right) for real-time culvert visual blockage monitoring

Future research directions

The potential of computer vision can be further explored to establish a better understanding of visual blockage by extracting blockage-related information:

  • Percentage visual blockage estimation
  • Flood-borne debris type recognition
  • Partially automated visual blockage classification

Percentage visual blockage estimation

In the context of flood management decision making, knowing the blockage status of a given culvert is not always enough to make a maintenance-related decision. Going one step further and estimating the percentage visual blockage at a given culvert assists flood management officials in prioritizing the culverts with high visual blockage.

A segmentation-classification pipeline to segment the visible openings from image and classifying the segmented masks into one of four percentage visual blockage classes can be one of the potential solutions. Figure 6 shows the conceptual block diagram for the percentage visual blockage estimation.

Diagram shows the process of extracting the visible culvert opening masks using Mask R-CNN and classifying them into percentage visual blockage classes using CNN classification model.
Figure 6. Conceptual diagram for the percentage visual blockage estimation at culverts use case

Flood-borne debris type recognition

The type of flood-borne debris interacting and accumulating at the culvert can result in distinct flooding impacts. Usually, vegetative debris is considered less concerning because of its porous nature in comparison to compact, urban debris.

Automatic detection of debris type is another crucial aspect to be explored.

Partially automated visual blockage classification

A CNN classification model may be used to facilitate the manual culvert inspections as a simplistic solution while keeping the flood management official in the loop. Given the complexity of the problem and preliminary analysis, it is not possible to only use a CNN classification model to automate the process. However, a partially automated framework can be developed to facilitate the process.

Figure 7 shows the concept of such a framework based on the classification probability of the trained model. If the classification probability for a given image is less than a given threshold, it can be flagged to flood management officials for cross-validation.

Diagram shows the partially assisted deep learning classification framework for the visual blockage detection at culverts. Images classified by the deep learning model with less than 80% confidence are manually assisted by flood management experts.
Figure 7. Partially automated visual blockage classification

Summary

We provided an edge-computing solution for the visual blockage detection at the culverts to assist the timely maintenance and to avoid the blockage-related flooding events.

A classification-detection computer vision model is developed and deployed using the NVIDIA edge-computing hardware to retrieve the blockage status of a culvert as “clear,” “blocked,” or “partially blocked.” To facilitate the training of computer vision models for this unique problem domain, we used simulated and artificially generated images related to culvert visual blockage.

There is a tremendous scope of extending the provided solution in multiple ways to achieve further improved and additional visual blockage information. Estimation of percentage visual blockage, detection of flood-borne debris, and developing a partially automated visual blockage classification framework are a few potential enhancements that can be made within the existing solution.

Categories
Misc

Upcoming Event: Level Up with NVIDIA: RTX in Unity

Learn how to leverage the latest NVIDIA RTX technology in Unity Engine and connect with experts during a live Q&A at this webinar on November 16.

Learn how to leverage the latest NVIDIA RTX technology in Unity Engine and connect with experts during a live Q&A at this webinar on November 16.

Categories
Misc

Stormy Weather? Scientist Sharpens Forecasts With AI

Editor’s note: This is the first in a series of blogs on researchers advancing science in the expanding universe of high performance computing. A perpetual shower of random raindrops falls inside a three-foot metal ring Dale Durran erected outside his front door (shown above). It’s a symbol of his passion for finding order in the Read article >

The post Stormy Weather? Scientist Sharpens Forecasts With AI appeared first on NVIDIA Blog.

Categories
Misc

Choosing NVIDIA Spectrum for Microsoft Azure SONiC

Everyone agrees that open solutions are the best solutions but, there are few truly open operating systems for Ethernet switches. At NVIDIA, we embraced open…

Everyone agrees that open solutions are the best solutions but, there are few truly open operating systems for Ethernet switches. At NVIDIA, we embraced open source for our Ethernet switches. Besides supporting SONiC, we have contributed many innovations to open-source community projects.

This post was originally published on the Mellanox blog in June 2018 but has been updated.

Microsoft runs one of the largest clouds in the world with Azure. In building and deploying Azure, they have gained a lot of insight into managing a global, high-performance, highly available, and secure network.

The network operating system (NOS) Microsoft uses for Azure, SONiC (Software for Open Networking in the Cloud), is built on open source. Their experience with hundreds of data centers and tens of thousands of switches has educated them about what is required:

  • Use best-of-breed switching hardware.
  • Ensure that deploying new features won’t affect end users.
  • Updates must be released securely and reliably across the globe within hours.
  • Use cloud-scale deep telemetry and automation for failure mitigation.
  • Enable software-defined networking to quickly provision and manage hardware elements in the network through a unified structure to eliminate duplication and reduce failures.

SONiC, a breakthrough for network switch operations and management, addresses these requirements. Microsoft open-sourced this innovation to the community, making it available on their SONiC GitHub repository.

SONiC is a uniquely extensible platform with a large and growing ecosystem of hardware and software partners that offers multiple switching platforms and various software components.

SONiC system’s architecture comprises multiple modules that interact with each other through a centralized and scalable infrastructure. This infrastructure relies on a Redis-database engine which allows data persistence, replication, and multi-process communication among all SONiC subsystems.

The Redis-engine infrastructure relies on a messaging paradigm of publisher/subscriber so that applications can subscribe only to the data views that they require, avoiding implementation details irrelevant to their functionality.

Diagram shows the configuration and management tools plus network applications working on the SONic base.
Figure 1. SONiC architecture

For more information about the SONiC architecture, see Architecture in the SONiC wiki.

NVIDIA Spectrum switches support a variety of Layer 2 and Layer 3 networking connectivity and management features. Table 1 shows the features that SONiC currently supports.

L3 L2 Management
BGP LAG SNMP
ECMP LLDP Syslog
DHCP Relay ECN NTP
IPv6/4 PFC CoPP
WRED TACACS+
CoS Sysdump
Mirroring
ACL
Table 1. Currently supported features

Why should you use NVIDIA Spectrum Switch with SONiC?

When choosing a switch to run SONiC on top, you should look at two main factors:

  • Is the switch vendor capable of supporting your deployment, ASIC, Switch Abstraction Interface (SAI), and software-wise?
  • What are the capabilities of the ASIC running underneath?

NVIDIA Spectrum ASIC-based switches

The NVIDIA Open Ethernet Switch portfolio is entirely based on the Spectrum ASIC, providing the lowest latency for 25G/100G in the market, zero packet loss, and a fully shared buffer. It is the ideal combination for cloud networking demands.

SONiC works with the Spectrum ASICs through their unique driver solutions. SONiC uses SAI, an open-source driver solution co-invented by NVIDIA. This open capability of Spectrum also means that any Linux distribution can run on a Spectrum switch.

NVIDIA is the only switch silicon vendor that has contributed their ASIC driver directly to the Linux kernel, enabling support for a mix of SONiC and any standard Linux distributions, like Red Hat or Ubuntu, to run directly on the switch.

Image shows multiple company logos  under sections labeled Application & Management Tools, SONiC, and SAI.
Figure 2. The SONiC development community

NVIDIA is the only company participating in all levels of the SONiC development community. We are one of the first companies to develop and adopt SAI. SONiC fully supports all Spectrum family switches and can be deployed on any switch in our Ethernet portfolio. We are also a major and active contributor to the SONiC OS feature set.

Pictures of SN2700, SN2410, and SN2100 switches.
Figure 3. NVIDIA switches

All NVIDIA networking platforms support port splitting through the SONiC OS, the only platforms that currently support this feature. Spectrum switches also deliver exceptional network performance compared to a commodity silicon-based switch using real-life mixed frame size, “noisy neighbor,” and microburst absorption scenarios.

For more information about the fundamental differences between NVIDIA Spectrum and Broadcom Tomahawk-based switches, and our unmatched ASIC performance, see Tolly Performance Evaluation: NVIDIA Spectrum-3 Ethernet Switch.

NVIDIA Spectrum switch systems are an ideal spine and top-of-rack solution, allowing flexibility, with port speeds ranging from 10 Gb/s to 100 Gb/s per port, and port density that enables full rack connectivity to every server at any speed. These ONIE-based switch platforms support multiple operating systems, including SONiC and leverage the advantages of Open Network disaggregation and the NVIDIA Spectrum ASIC capabilities.

Spectrum adaptive routing technology supports various network topologies. For typical topologies such as CLOS (or leaf/spine), the distance of the multiple paths to a given destination is the same. Therefore, the switch transmits the packets through the least congested port.

In other topologies where distances vary between paths, the switch prefers to send the traffic over the shortest path. If congestion occurs on the shortest path, then the least-congested alternative paths are selected. You can build a high-performing CLOS data center using the NVIDIA switches as your building blocks.

Similarly, Border Gateway Protocol (BGP) is a routing protocol responsible for looking at all the available paths that data could travel and picking the best route. BGP enables communication to happen quickly and efficiently.

Diagram shows 32 switches linked with pods by eBGP. Layer 3 ECMP, all links active/active, with very small fault domains.
Figure 4. Typical leaf-spine pod design with BGP as the routing protocol

Spectrum switches enable PODs. A POD is a network, storage, and compute unit that works together to deliver networking services. A POD is a repeatable design pattern that provides scalable and easier-to-manage data centers.

Diagram shows switches linking to multiple clusters and pods.
Figure 5. Scaling to multiple PODs

Finally, the Spectrum family enables a set of advanced network functions that future-proof the switch with the flexibility to handle evolving networking technologies. This includes new protocols that may be developed in the future, enabling custom applications, advanced telemetry, and new tunneling/overlay capabilities. Spectrum combines a programmable, flexible, and massively parallel packet processing pipeline with a fully shared and stateful forwarding database. Spectrum also features What Just Happened (WJH), the world’s most useful switch telemetry technology.

For more information, see the following resources:

Categories
Misc

Latest Discoveries at the Healthcare & Life Sciences Developer Summit

Humanity has seen major scientific breakthroughs directly related to discoveries that do not share the glamor of the breakthrough they enabled. Sir Alexander…

Humanity has seen major scientific breakthroughs directly related to discoveries that do not share the glamor of the breakthrough they enabled.

Sir Alexander Fleming’s penicillin gave rise to effective treatments for infections like pneumonia, but penicillin’s importance outshines a technology known as the Petri dish, invented by a German physician. It was in a Petri dish that penicillin was found when Fleming returned from his vacation.

Naturally, the importance of the tools and components that enabled scientific advancements and technological progress are not as celebrated as the new technology, but they are just as important to the discovery.

Today, in a world full of open-source projects, pretrained machine learning models, and affordable computing available at scale, developers and scientists have more resources to combine and create. 

Like the Petri dish that enabled penicillin, developers and scientists can use existing components to generate new discoveries of great social impact in the healthcare industry.

NVIDIA Healthcare and Life Sciences Developer Summit

NVIDIA is hosting a free Healthcare and Life Sciences Developer Summit on November 10, 2022, with key webcasts for developers, startups, and industry leaders. The sessions show how NVIDIA technologies are supporting the future of medicine.

The virtual summit offers a full day of technical talks to reach developers and technical leaders in the EMEA region. Led by NVIDIA healthcare team members and startups like Relation Therapeutics, ImFusion, Rhino Health, and Quantib, the day features talks about high-performance computing, large language models, genomics, and medical imaging.

Supporting healthcare startups

NVIDIA has been nurturing more than 12K startups globally through the Inception program, its virtual accelerator, with nearly 2,000 startups in the healthcare industry.

At the latest GTC, success stories from Inception members were shared in the Accelerating Healthcare & Life Science Innovation with Makers and Breakers session. Startups such as Activ Surgical, Instadeep, Haply Robotics, DNAnexus, and Quantib talked about their experiences and recent achievements in medical imaging, medical instruments, and biopharma.

Renee Yao, Global AI Healthcare AI Startups Lead at NVIDIA, has seen several startups achieve success by leveraging NVIDIA technologies, enabling those in the healthcare and life sciences industry to build faster and cheaper.

A lot of innovation is happening in the healthcare space, particularly those contributing to enhancing precision health powered by machine learning. Yao advises startups to consider what scientific and software communities have already built before starting developments from scratch.

David Ruau, NVIDIA Head of Strategic Alliances in Drug Discovery, talks about the upcoming breakthroughs in the biopharma domain, not only through large and traditional companies but also through startups.

“The pace of innovation that AI has applied to drug discovery is still accelerating,” says Ruau. He believes that technologies such as transformers, geometric deep learning, diffusion models, and many other approaches applied to all the steps of the drug discovery process are contributing to giant leaps in innovation.

Ruau explains, “startups must be nimble and agile, as speed is key in this domain.” He points out the importance of funding and lowering the entry requirements for innovating with machine learning. Software developers and scientists can use NVIDIA open-source frameworks in the cloud, on-premises, or in a hybrid approach. This enables faster results by any healthcare technology startup, regardless of their funding stage.

Image of large language models being used for drug discovery.
Figure 1. Transformer-based large language models are creating new possibilities for real-time exploration of the chemical universe

Enabling with technology

NVIDIA technologies are used by organizations all over the world to accelerate their research and fuel new discoveries.

NVIDIA recently announced BioNeMo, a transformer-based framework and cloud service. It can process SMILES and protein sequences to predict structures and accelerate the discovery of druggable targets. BioNeMo has been built on top of NVIDIA NeMo Megatron, a 3B parameters model, and it is already optimized for GPU.

Another major announcement came from Project MONAI, a PyTorch-based, open-source framework for deep learning in healthcare imaging. The latest version, 1.0, includes new and enhanced features, which include preprocessing for multidimensional medical imaging data, automated segmentation, GPU data parallelism, and more.

Useful for creating state-of-the-art, end-to-end training workflows for healthcare imaging, MONAI provides researchers with the optimized and standardized way to create and evaluate deep learning models.

Following these two announcements, select startups are presenting their research at the Healthcare and Life Sciences Developer Summit, open to all.

DNA-to-gene-expression model

Relation Therapeutics, a drug discovery startup from London, is using transformer-based machine learning and ActiveGraph technology to better understand the biology of diseases. They aim to discover and develop new therapeutics, help humanity understand why patients become sick, and ultimately cure disease.

Their technology can understand combinatorial relationships between genes, proteins, and drugs. It involves calculations that require efficient use of computational resources and an exceptional interdisciplinary team of researchers, machine learning, data scientists, and engineers to devise these models.

Relation Thereapeutics is currently training a transformer-based DNA-to-gene-expression model on 80 NVIDIA A100 GPUs. The GPUs are hosted on Cambridge-1, the UK’s most powerful supercomputer. It has 400 petaflops of AI performance that leverages NCCL and cuDNN for GPU-optimized workflows.

By using existing technologies, such as NVIDIA GPUs and an optimized software stack, Relation Therapeutics can create novel approaches that push the boundaries of what is known about biology. They are becoming one of the best modern-day examples of how to leverage existing technologies to create new ones and positively impact human health.

The future of medicine

Our society is already witnessing historical breakthroughs powered by computational methods, and startups play a crucial role in achieving such advancements.

A new era of scientific discovery and computational biology has begun, and NVIDIA technologies are the reliable ground in which innovation can be built upon. Speeding up scientific development or serving as a catalyst of novel stunning technologies, like the key tools that enabled penicillin to exist.

NVIDIA is enabling thousands of individuals and organizations across all industries through its optimized computing stack, enabling major technological transformation. Register now for the Healthcare and Life Sciences Developer Summit on November 10, 2022 to see how key startups are defining the future of medicine with the power of AI.

Categories
Misc

Explainer: What Is a Digital Twin?

A digital twin is a virtual representation synchronized with physical things, people, or processes.

A digital twin is a virtual representation synchronized with physical things, people, or processes.

Categories
Misc

Reducing Power Plant Greenhouse Gasses Using AI and Digital Twins

Reducing the amount of carbon released to the atmosphere is a political priority. The current U.S. administration plans to achieve net-zero carbon emissions…

Reducing the amount of carbon released to the atmosphere is a political priority. The current U.S. administration plans to achieve net-zero carbon emissions from the power grid by 2035, and industry-wide by 2050. 

To achieve this goal, a variety of techniques are being developed that capitalize on the efficiency of AI to fight against climate change. For power plants, developing techniques to reduce carbon emissions, carbon capture, and storage processes requires a detailed understanding of the associated fluid mechanics and chemical processes throughout the facility. This requires scientifically accurate simulations of fluid mechanics, heat transfer, chemical reactions, and their degree of interaction. 

Graphic showing three techniques for reducing greenhouse gas emissions, that are direct air capture, flue gas capture, and underground carbon storage. Digital twins can be used for optimal control and operational tuning of these systems.
Figure 1. To achieve net-zero carbon emissions, digital twins can be used to scale up several carbon capture, storage, and removal systems. Credit: National Energy Technology Laboratory media team

One important focus for industrial use cases is the development of more efficient fuel conversion devices. The goal is to create devices that are more flexible so that the equipment can integrate with renewable resources in a more reliable way. 

It is crucial to have better design optimization, uncertainty quantification, and accurate digital twins so that design and control of energy conversion devices can be handled adequately without causing billions of dollars in damage. AI is a natural choice to be used for developing such digital twins that can provide near real-time predictions without compromising the accuracy.

This post explains how the physics-informed machine learning (ML) framework, NVIDIA Modulus, is being used to bypass the conventional methods to enable large-scale scientific modeling, and to develop power plant digital twins that can help move towards net-zero carbon emissions.

Simulating industrial power plants with physics-ML

Flow field predictions for new operating conditions, such as changes in input conditions or geometrical configuration of the boiler components, require new computational fluid dynamics simulations. This can become very expensive and time consuming if simulations cover large parameters of space, such as those used for uncertainty quantification studies. Also, in most cases, the entirety of a flow field is not of interest. A neural network, once trained, can predict the flow field at the required points in the affected area in a fraction of a second. 

Together with the National Energy Technology Laboratory (NETL), NVIDIA Modulus researchers are developing a digital twin of a power plant boiler capable of modeling turbulent reacting flows. The digital twin will use machine learning to replicate the flow conditions inside a boiler with a high level of fidelity, and be capable of providing near-instantaneous flow predictions for the operating conditions of interest. 

Understanding of the internal velocity, temperature, and species fields is crucial for taking steps towards mitigating emission of greenhouse gasses and pollutants. Physics-informed ML, otherwise known as physics-ML, can be used to model predictive control to help plant operators optimize boiler operating conditions for efficiency and performance. 

While not part of this study, it should be noted that digital twins can also be deployed for cybersecurity purposes to act as digital ghosts to distract intruders from the intended targets. These digital ghosts provide copies of the operating conditions with synthetic variations in the state of the system to the control room. If an intruder accesses control room data, they will not be able to distinguish between the real operational data and these copies. This framework can be extended to model other power plant components with moderate effort.

Enhancing proxy models with real-time feedback

The proxy model can also be coupled with the live feedback from sensors attached to the boiler to constantly improve itself with the assimilation of field data. 

A typical power plant boiler includes dozens of concentration sensors, hundreds of temperature sensors, and thousands of sensors that measure the flow data. The placement of these sensors should be optimized so that the harsh boiler operating conditions, where temperatures can exceed 1,000℃,  do not melt or damage these sensors. These sensors can provide a stream of data to the proxy model to constantly update the model through data assimilation and online learning to improve the model’s accuracy and reliability. 

Additionally, a lot of uncertainty surrounds the parameters of physical models, such as reaction kinetics and viscosity. Using sensor data can reduce these uncertainties by assimilating field data into the proxy model. Failure to account for these uncertainties can cause significant monetary damages and power outages for days or even months.

Graphics showing elements of a power plant boiler and the simplified boiler created in this study.
Figure 2. Due to power plant boilers’ complicated designs (left), this study uses a simplified version (right) to test for flow field, temperature, and species mass concentration. Credit: Mbeychok

Figure 2 illustrates the simplified boiler used in this study to solve for the flow field, temperature, and species mass concentration where methane and oxygen react to form carbon dioxide and water. The water flowing through the boiler tubes is heated and converted to vapor that is directed through the turbines for power generation. The reaction products, CO2 and H2O, are released into the air. If captured, they can be injected underground. 

The oxidizer inlet velocity directly controls the flow conditions within the boiler, and subsequently its efficiency and performance. Changes in the oxidizer input velocity influences the combustion processes within the boiler which in turn, affects the temperature and species distributions throughout the system. 

It is important to control the temperature inside the boiler as it directly influences the temperature and state of the working fluid, which in this study is water. This is so critical that the thermal constraints for different components outside of the boiler itself, including the water tubes and turbines, must be met. 

Additionally, the oxidizer inlet velocity controls the amount of air that enters the reactor per unit time which directly affects the residence time of the species and the overall mixing behavior. If the residence time is too short, the reactions may not take place properly within the boiler and if it is too long, additional reactions leading to excessive pollutant emissions, such as CO, NOx, and CO2 may take place. Therefore, the inlet velocity is a key variable for optimizing the combustion processes and power generation. 

Equation Name Equation
Continuity frac{partial rho}{partial t}+frac{partial (rho u_i)}{partial x_i}=0
Species Mass Fraction rho frac{partial Y_k}{partial t}+rho u_i frac{partial Y_k}{partial x_i}= dot omega_k + frac{partial}{partial x_i}left(rho D_k frac{partial Y_k}{partial x_i}right)
Momentum rho frac{partial u_i}{partial t}+ rho frac{partial u_iu_j}{partial x_i} = -frac{partial p}{partial x_i} + frac{partial tau_{ij}}{partial x_j}
Temperature frac{partial T}{partial t}+ frac{partial u_j T}{partial x_i} = frac{dot omega_T}{rho c_p} + frac{partial}{partial x_i}left(alpha frac{partial T}{partial x_i}right)
Kinetics-Controlled Single Step Irreversible Reaction CH_4+ 2(O_2+3.76N_2) rightarrow CO_2+2H_2O+7.52N_2
Species Source/Sink Terms dot omega_{CH_4} = -MW_{CH_4}k_f(frac{rho Y_{CH_4}}{MW_{CH_4}})(frac{rho Y_{O_2}}{MW_{O_2}}) etc
Temperature Source Term dot omega_{T} = -sum_{k=1}^{N} h_{k}dot omega_{k}
Table 1. Commonly used equations to determine flow field, temperature, species mass concentration, and other key metrics for generic boiler environments

How to build a boiler digital twin using physics-informed ML

NVIDIA Modulus offers a collection of models that can be trained purely based on physics or data, or a combination of data and physics. By parameterizing these models, and leveraging the optimized inference pipeline, NVIDIA Modulus is capable of predicting the system behavior under varying operational and environmental conditions as a post-processing step. 

As previously outlined, NVIDIA Modulus was used to develop a parameterized model trained on the governing laws of physics for a generic boiler. Once trained, this model provides near-instantaneous predictions of the temperature inside the boiler, species mass concentration, and flow velocity, and pressure for any given inlet velocity for species. No training data is used, and the loss function is formulated solely on how well the neural network solution satisfies the governing equations and the boundary conditions. 

For this problem, the temperature at the inlets has been fixed at 650K, and the walls are at 350K.The parameter space for this simplified case is spanned by the varying oxidizer inlet velocity that ranges between 1 and 5 m/s. 

A zero-equation formulation is used to model the turbulent Reynolds stresses. The Sinusoidal Representation Network (SiReN) in NVIDIA Modulus is used as the network architecture. A key component of this network architecture is the initialization scheme in which  the weight matrices of the network are drawn from a uniform distribution. The input of each Sin activation has a normal distribution and the output of each Sin activation has an arcSin distribution. This preserves the distribution of activations allowing deep architectures to be constructed and trained effectively. 

The first layer of the network is scaled by a factor to span multiple periods of the Sin function. This was empirically shown to give good performance and is in line with the benefits of the input encoding in the Fourier networks. Several NVIDIA Modulus features such as L2 to L1 loss decay and spatial loss weighting using Signed Distance Function (SDF) are used to improve accuracy. Time-to-convergence is minimized by utilizing NVIDIA Modulus performance upgrades such as Just-In-Time (JIT) compilation and CUDA graphs.

A comparison between NVIDIA Modulus and commercial solver results for multi-species (CH4, O2, N2) non-reacting flow with oxidizer inlet velocity set to 2 m/s.
Figure 3. Species and temperature distributions designed in NVIDIA Modulus compare well against commercial solver results

A moving-time window method developed by the NVIDIA Modulus team is used for transient flow modeling. Solving transient simulations with only the conventional continuous time method can be difficult for long time durations. The moving-time window method iteratively solves for small consecutive time windows. The continuous time method is used for solving inside a particular window, and the solution at the end of each time window is used as the initial condition for the next window.

Graphic showing that the time domain is discretized into several smaller subdomains, known as time windows. The continuous time method is used for solving inside a particular window, and the solution at the end of each time window is used as the initial condition for the next window. This is done iteratively until the entire time windows are solved for.
Figure 4. The moving-time window method developed by NVIDIA Modulus researchers used to model transient flow

Selective Equation Terms Suppression

For modeling combustion, a novel approach developed by NETL researchers, called Selective Equation Terms Suppression (SETS), is used to drastically improve the training convergence. 

For several partial differential equations (PDEs), the terms in the physical equations have different scales in time and magnitude (sometimes also known as stiff PDEs). For such PDEs, the loss equation can appear to be minimized despite poor treatment of the smaller terms. 

Using the SETS approach to tackle this, you can create multiple instances of the same PDE and freeze certain terms. During the optimization process, this forces the optimizer to use the value from the former iteration for the frozen terms. Thus, the optimizer minimizes each term in the PDE and efficiently reduces the equation residual. 

This prevents any one term in the PDE dominating the loss gradients. Creating multiple instances with different frozen terms in each instance allows the overall representation of the physics to remain the same, while allowing the neural networks to better learn the dynamic balance between all the terms in the equations. 

However, creating multiple instances of the same equation (with different frozen terms) also creates multiple loss terms, each of which can be weighted differently. Several other formulations were also developed to efficiently handle the stiff PDEs: 

  1. Ramping up the source terms gradually during training to allow the neural networks to adjust better to the problem. 
  2. Having better control over the coupling between the species equations and the temperature equation by adjusting the order in which they are trained and their relative number of training instances.  

Residual Normalization 

Another novel approach used for improving the training convergence is Residual Normalization or ResNorm. The predominant approach used in loss balancing of the neural network solvers is to multiply a parameter to each of the individual loss terms to balance out the contribution of each term to the overall loss. However, tuning these parameters manually is not straightforward, and also requires treating these parameters as constants. ResNorm minimizes an additional loss term that encourages the individual losses to take similar relative magnitudes. The loss weights are dynamically tuned throughout the training based on the relative training rates of the different constraints. 

The distribution of the two reacting species, a product species and temperature, is shown in Figure 5. The reactants come into contact to form the thin reaction zone where products are formed. Energy is then released that increases the local temperature, which is advected and diffused throughout the domain. 

A GIF showing reactants being consumed near the reaction zone (where the reactants and products come into contact and get advected outwards), and the products are generated in the reaction zone. Also, temperature is increased in the reaction zone.
Figure 5. Time evolution of the two reactants (O2+CH4), one product (H2O), and temperature fields obtained using NVIDIA Modulus PINN solver

3D design integration with NVIDIA Omniverse

NVIDIA Omniverse is an easily extensible platform for 3D design collaboration and scalable multi-GPU, real-time, true-to-reality simulation. The NVIDIA Omniverse extension for NVIDIA Modulus enables real-time, virtual-world simulation and full-design fidelity visualization. The built-in pipeline can be used for common visualizations such as streamlines and isosurfaces for the outputs of the NVIDIA Modulus model. 

Another benefit from this integration is being able to visualize and analyze the high-fidelity simulation output in near real-time as the design parameters are varied. In the final part of the power plant boiler project, the NVIDIA Omniverse extension for NVIDIA Modulus will be used to develop a boiler digital twin that uses the final trained model to provide near-instantaneous prediction and visualization of the flow, temperature, pressure, and species mass concentration inside the boiler under varying operating conditions.

NVIDIA Omniverse digital twins consist of several parts including NVIDIA Modulus for near-instantaneous prediction of the system behavior under varying input conditions. The digital twins continuously receive data from the physical counterpart to update the state of the system.
Figure 6. Components of an NVIDIA Omniverse digital twin for physical systems

To learn more about the NVIDIA Modulus integration with NVIDIA Omniverse, see Visualizing Interactive Simulations with Omniverse Extension for NVIDIA Modulus

For even more related learning, check out the self-paced online course, Introduction to Physics-informed Machine Learning with Modulus and the free NVIDIA On-Demand session, Journey Toward Zero-Carbon Emissions Leveraging AI for Scientific Digital Twins.

Disclaimer: This project was funded by the United States Department of Energy, National Energy Technology Laboratory, in part, through a site support contract. Neither the United States Government nor any agency thereof, nor any of their employees, nor the support contractor, nor any of their employees, makes any warranty, express or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights.  Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States Government or any agency thereof. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or any agency thereof.

Categories
Misc

Explainer: What Is Green Computing?

Green computing, also called sustainable computing, aims to maximize energy efficiency and minimize environmental impact in the ways computer chips, systems,…

Green computing, also called sustainable computing, aims to maximize energy efficiency and minimize environmental impact in the ways computer chips, systems, and software are designed and used.

Categories
Misc

Orchestrating Accelerated Virtual Machines with Kubernetes Using NVIDIA GPU Operator

Many organizations today run applications in containers to take advantage of the powerful orchestration and management provided by cloud-native platforms based…

Many organizations today run applications in containers to take advantage of the powerful orchestration and management provided by cloud-native platforms based on Kubernetes. However, virtual machines continue to remain as the predominant data center infrastructure platform for enterprises, and not all applications can be easily modified to run in containers. For example, applications requiring older operating systems, custom kernel modules, or specialized hardware require more effort to containerize.  

KubeVirt and OpenShift Virtualization are add-ons to Kubernetes that provide virtual machine (VM) management. These solutions eliminate the need to manage separate clusters for VM and container workloads. KubeVirt is a community-supported open source project, and it also serves as the upstream project for the OpenShift Virtualization feature from Red Hat.

NVIDIA GPUs have been accelerating applications that are virtualized for many years, and NVIDIA has also created technology to support GPU acceleration for containers managed by Kubernetes. The latest release of the NVIDIA GPU Operator adds support for KubeVirt and OpenShift Virtualization. Now, GPU-accelerated applications running as virtual machines can be orchestrated by Kubernetes too, just like ordinary enterprise applications, enabling unified management. 

GPUs in KubeVirt and OpenShift Virtualization

NVIDIA GPU Operator v22.9 enables GPU-accelerated containers and GPU-accelerated virtual machines, using either NVIDIA Virtual GPU (vGPU) or PCI passthrough, to run alongside each other in the same cluster. This version introduces new software components that support virtual machines. 

Additionally, the operator is responsible for providing automation to manage the deployment, configuration, and lifecycle of this software, easing the operational overhead on cluster administrators. More detailed information about these components is provided below. 

The vfio-pci driver (Virtual Function I/O) provides a secure user space driver that is needed when using a physical GPU for PCI passthrough. PCI passthrough presents the entire GPU as a PCI device to a virtual machine. When using PCI passthrough, the GPU cannot be shared, but provides the highest performance. 

The NVIDIA vGPU Manager is the driver installed on the hypervisor that enables NVIDIA Virtual GPU technology. NVIDIA vGPU enables multiple virtual machines to have simultaneous, time-based shared access to a single physical GPU.  

The NVIDIA vGPU Device Manager is responsible for interacting with the vGPU Manager and creating vGPU devices on the worker node. 

The NVIDIA KubeVirt device plug-in discovers and advertises both physical and NVIDIA vGPU devices to kubelet so that they can be requested and assigned to VMs. Kubelet is an agent running on every node in the cluster, responsible for communication between the node and the Kubernetes control plane.

Planning for deployment

Prior to deployment, it is important to be aware of some of the limitations. Currently, MIG-backed vGPU instances are not supported. Additionally, a given GPU worker node can only run GPU workloads of single type—containers, VMs with PCI passthrough, or VMs with NVIDIA vGPU—but not a combination. 

To enable this new functionality, set sandboxWorkloads.enabled to true in ClusterPolicy. When enabled, the GPU Operator will manage and deploy the new software components needed for supporting virtual machines. This option is disabled by default, meaning that the GPU Operator will only provision worker nodes for container workloads.

Administrators have the ability to control where workloads get deployed through the use of Kubernetes node labels. GPU Operator v22.9 introduces a new node label, nvidia.com/gpu.workload.config, which dictates which software components get deployed by the GPU Operator and consequently controls what type of GPU workloads a node supports. This node label can take on the values container, vm-passthrough, and vm-vgpu which correspond to the different workloads now supported. 

This concept allows administrators to have pools of machine types, each with different capabilities, and managed by a common control plane. If the nvidia.com/gpu.workload.config node label is not present on a GPU worker node, the GPU Operator will use the default workload type, which is configurable in ClusterPolicy through the sandboxWorkloads.defaultWorkload field.

Conclusion

GPU Operator v22.9 brings with it additional capabilities required to run GPU-powered workloads on Kubernetes with KubeVirt and OpenShift Virtualization. VMs in Kubernetes can attach GPU devices using PCI passthrough or NVIDIA vGPU. This flexibility speeds the adoption of cloud-native platforms by removing the need to refactor GPU-accelerated applications to support containerization. Administrators can continue to run these applications in VMs alongside other container native applications, with Kubernetes performing the orchestration. 

Getting started

To get started with GPU accelerated virtual machines, see the official documentation on Running KubeVirt VMs with the GPU Operator. Submit feedback and bug reports through the gpu-operator/issues GitHub repository. Contributions to the kubernetes/gpu-operator GitLab repository are also encouraged. 

Additional resources