Categories
Misc

Unlocking the Language of Genomes and Climates: Anima Anandkumar on Using Generative AI to Tackle Global Challenges

Generative AI-based models can not only learn and understand natural languages — they can learn the very language of nature itself, presenting new possibilities for scientific research. Anima Anandkumar, Bren Professor at Caltech and senior director of AI research at NVIDIA, was recently invited to speak at the President’s Council of Advisors on Science and Read article >

Categories
Misc

Scaling Deep Learning Deployments with NVIDIA Triton Management Service

Organizations are integrating machine learning (ML) throughout their systems and products at an unprecedented rate. They are looking for solutions to help deal…

Organizations are integrating machine learning (ML) throughout their systems and products at an unprecedented rate. They are looking for solutions to help deal with the complexities of deploying models at production scale. 

NVIDIA Triton Management Service (TMS), exclusively available with NVIDIA AI Enterprise, is a new product that helps do just that. Specifically, it helps manage and orchestrate a fleet of NVIDIA Triton Inference Servers in a Kubernetes cluster. TMS enables users to scale their NVIDIA Triton deployments to handle large and varied workloads efficiently. It also improves the developer experience of coordinating the resources and tools required. 

This post explores some of the most common challenges developers and MLOps teams face when deploying models at scale, and how NVIDIA Triton Management Service addresses them. 

Challenges in scaling AI model deployment

Model deployments of any scale come with their own sets of challenges. Developers need to consider how to balance a variety of frameworks, model types, and hardware while maximizing performance and interfacing with the other components of the environment. 

NVIDIA Triton is a powerful solution built to handle these issues and extract the best throughput and performance from the machine it’s deployed on. But as organizations incorporate AI into more of their core workflows, the number and size of inference workloads can grow beyond what a single server can handle. The model deployments have to scale. A new scale of deployment brings with it a new set of challenges—challenges related to the cost and complexity of managing distributed inference workloads. 

Cost of deployment

As you deploy more models and find more use cases for them, it can quickly become necessary to scale out deployments to make use of a cluster of resources. A simple approach is to keep scaling your cluster linearly as you add more models, keeping all of your models live and ready for inference at all times. 

However, this is not an approach with infinite scale potential. Focusing on expanding the capacity of your serving cluster can result in unnecessary expenses when you have the option to improve utilization of currently available hardware. You will also have to deal with the logistical challenges of adding more resources on premises, or bumping up against quota limits in the cloud. 

Other approaches to scaling might appear less expensive, but can lead to steep performance trade-offs. For example, you could wait to load the models into memory until the inference requests come in, leading to long waits and an extended time-to-first-inference. Or you could overcommit your compute resources, leading to performance penalties from context switching during execution and errors from running out of memory on the device.

With careful preplanning and colocation of workloads, you can avoid some of the worst of these issues. Still, that only exacerbates the second major issue of large-scale deployments.

Operational complexity

At a small scale and early in the development of a process that requires model orchestration, it can be viable to manually configure and deploy your models. But as your ML deployments scale, it becomes increasingly challenging to coordinate all of the necessary resources. You need to manage when to launch or scale servers, where to load particular models, how to route requests to the right place, and how to handle the model lifecycle in your environment. 

Determining which models can be colocated adds another layer of complexity to these deployments. Large models might exceed the memory capacity of your GPU or CPU if loaded concurrently into the same device. Some frameworks (such as PyTorch and TensorFlow) hold on to any memory allocated to them even after the models are unloaded, leading to inefficient utilization when models from those frameworks are run alongside models from other frameworks. 

In general, different models will have different requirements regarding resource allocation and server configuration, making it difficult to standardize on a single type of deployment. 

Cost-efficient deployment and scaling of AI models

Triton Management Service addresses these challenges with three main strategies: simplifying Triton Inference Server deployment, maximizing resource usage, and monitoring/scaling Triton inference servers.

Simplifying deployment

TMS automates the deployment and management of Triton server instances on Kubernetes using a simplified gRPC API and command-line tool. With these interfaces, you don’t need to write out extensive code or config files for creating deployments, services, and Kubernetes resources. Instead, you can use the API or CLI to easily launch Triton servers and automatically load models onto these servers as needed. 

TMS also employs a method of grouping to optimize GPU or CPU memory utilization. This prevents issues that arise when different frameworks like PyTorch and TensorFlow models run on the same server and fail to release unused GPU or CPU? memory to each other.

Maximizing resources

TMS loads models on-demand and unloads them using a lease system when they are not in use, making sure that models are not kept active in the cluster unnecessarily. To bring up a model, you can submit an API request with a specified timeline or a checking mechanism. The system will keep the model available if it’s being used; otherwise, it will be taken down. 

TMS also automatically colocates models on the same device when sufficient capacity is available. To enable this, you need to prespecify the expected GPU memory use of your models during deployment. While there is no automated way to measure this yet, you can rely on Triton Model Analyzer and other benchmarking tools to determine ‌memory requirements beforehand. Together, these features enable you to run more workloads on your existing clusters, saving on costs, and reducing the need to acquire more computational resources. 

Monitoring and autoscaling

TMS keeps track of the health and capacity of various Triton servers for high availability reasons. Autoscaling is integrated into the system, enabling TMS to deploy Kubernetes Horizontal Pod Autoscalers automatically based on the model deployment configuration. You can specify metrics for autoscaling, indicating the conditions under which scaling should occur. Load balancing is also applied when autoscaling is implemented across multiple Triton instances.

How Triton Management Service works

A diagram depicting the step-by-step orchestration flow in NVIDIA Triton Management Service. First, a client application sends a lease request to the TMS Server. Then, the TMS Server creates a lease, and loads it onto one of several pods. Third, the pod loads the model from an external model repository. Finally, a client application sends inference queries to the pods with the loaded models.
Figure 1. Overall orchestration flow for NVIDIA Triton Management Service

To install TMS, deploy a Helm chart with configurable values into a Kubernetes cluster. This Helm chart delpoys the TMS Server control plane into the cluster, along with a config map that holds many of the configuration settings for TMS. You can operate TMS through gRPC API calls to the TMS Server, or by using the provided tmsctl command-line tool. 

The key concept in TMS is the lease. At its core, a lease is a grouping of models and some associated metadata that tells TMS how to treat those models, and what constraints exist for their deployment. Users can create, renew, and release leases. Creating a lease requires specifying a set of models from predefined repositories by a unique identifier, along with metadata including:

  • Compute resources required by the lease
  • Image/version of Triton to use for this lease
  • Minimum duration of the lease
  • Window size for detecting activity on the models in the lease
  • Metrics and thresholds for scaling the lease
  • Constraints on what models or leases with with the new lease can be collected
  • A unique name for the lease that can be used to addressed it

When the TMS Server receives the lease request, it performs the actions listed below to create the lease:

  • Check the model repositories to see if the models are present and accessible. 
  • If models are present and accessible, check for existing Triton Inference Servers present in the cluster that meet the constraints of the new lease.
    • If none exist, create a new Kubernetes pod containing the Triton Inference Server container and a Triton Sidecar Container.
    • Otherwise, choose one of the existing Triton pods to add the lease to. 
  • In either case, the Triton Sidecar in the Triton Pod will pull the models in your lease from the repository and load them into its paired Triton server. 

TMS will also create several other Kubernetes resources to help with management and routing for the lease:

  • A deployment that will revive Triton pods if they crash.
  • A Kubernetes service based on the lease name that can be used to address the models in the lease.
  • A horizontal pod autoscaler to automatically create replicas of the Triton pods based on the metrics and thresholds defined in the lease.

Once the lease has been created, you can use the Triton Inference Server API or an existing Triton client to send inference requests to the server for execution. The Triton client does not need any modifications to work with Triton Inference Servers deployed by Triton Management Service. 

Get started with NVIDIA Triton Management Service

To get started with NVIDIA Triton Management Service and learn more about its features and functionality, check out the AI Model Orchestration with Triton Management Service lab on LaunchPad. This lab provides free access to a GPU-enabled Kubernetes cluster and a step-by-step guide on installing Triton Management Service and using it to deploy a variety of AI workloads. 

If you have existing compatible on-premises systems or cloud instances, request a 90-day NVIDIA AI Enterprise Evaluation License to try Triton Management Service. If you are an existing NVIDIA AI Enterprise user, simply log in to the NGC Enterprise Catalog.

Categories
Misc

Power Your Business with NVIDIA AI Enterprise 4.0 for Production-Ready Generative AI

A GIF showing different applications of AI Enterprise.Crossing the chasm and reaching its iPhone moment, generative AI must scale to fulfill exponentially increasing demands. Reliability and uptime are critical for…A GIF showing different applications of AI Enterprise.

Crossing the chasm and reaching its iPhone moment, generative AI must scale to fulfill exponentially increasing demands. Reliability and uptime are critical for building generative AI at the enterprise level, especially when AI is core to conducting business operations. NVIDIA is investing its expertise into building a solution for those enterprises ready to take the leap.

Introducing NVIDIA AI Enterprise 4.0

The latest version of NVIDIA AI Enterprise accelerates development through multiple facets with production-ready support, manageability, security, and reliability for enterprises innovating with generative AI.

Quickly train, customize, and deploy LLMs at scale with NVIDIA NeMo 

Generative AI models have billions of parameters and require an efficient data training pipeline. The complexity of training models, customization for domain-specific tasks, and deployment of models at scale require expertise and compute resources. 

NVIDIA AI Enterprise 4.0 now includes NVIDIA NeMo, an end-to-end, cloud-native framework for data curation at scale, accelerated training and customization of large language models (LLMs), and optimized inference on user-preferred platforms. From cloud to desktop workstations, NVIDIA NeMo provides easy-to-use recipes and optimized performance with accelerated infrastructure, greatly reducing time to solution and increasing ROI.  

Build generative AI applications faster with AI workflows

NVIDIA AI Enterprise 4.0 introduces two new AI workflows for building generative AI applications: AI chatbot with retrieval augmented generation and spear phishing detection. 

The generative AI knowledge base chatbot workflow, leveraging Retrieval Augmented Generation, accelerates the development and deployment of generative AI chatbots tuned on your data. These chatbots accurately answer domain-specific questions, retrieving information from a company’s knowledge base and generating real-time responses in natural language. It uses pretrained LLMs, NeMo, NVIDIA Triton Inference Server, along with third-party tools including Langchain and vector database, for training and deploying the knowledge base question-answering system.

The spear phishing detection AI workflow uses NVIDIA Morpheus and generative AI with NVIDIA NeMo to train a model that can detect up to 90% of spear phishing e-mails before they hit your inbox. 

Defending against spear-phishing e-mails is a challenge. Spear phishing e-mails are indistinguishable from benign e-mails, with the only difference between the scam and legitimate e-mail being the intent of the sender. This is why traditional mechanisms for detecting spear phishing fall short. 

Develop AI anywhere  

Enterprise adoption of AI can require additional skilled AI developers and data scientists. Organizations will need a flexible high-performance infrastructure consisting of optimized hardware and software to maximize productivity and accelerate AI development. Together with NVIDIA RTX 6000 Ada Generation GPUs for workstations, NVIDIA AI Enterprise 4.0 provides AI developers a single platform for developing AI applications and deploying them in production. 

Beyond the desktop, NVIDIA offers a complete infrastructure portfolio for AI workloads including NVIDIA H100, L40S, L4 GPUs, and accelerated networking with NVIDIA BlueField data processing units. With HPE Machine Learning Data Management, HPE Machine Learning Development Environment, Ubuntu KVM and Nutanix AHV virtualization support, organizations can use on-prem infrastructure to power AI workloads.  

Manage AI workloads and infrastructure

NVIDIA Triton Management Service, an exclusive addition to NVIDIA AI Enterprise 4.0, automates the deployment of multiple Triton Inference Servers in Kubernetes with GPU resource-efficient model orchestration. It simplifies deployment by loading models from multiple sources and allocating compute resources. Triton Management Service is available for lab experience on NVIDIA LaunchPad.

NVIDIA AI Enterprise 4.0 also includes cluster management software, NVIDIA Base Command Manager Essentials, for streamlining cluster provisioning, workload management, infrastructure monitoring, and usage reporting. It facilitates the deployment of AI workload management with dynamic scaling and policy-based resource allocation, providing cluster integrity.

New AI software, tools, and pretrained foundation models

NVIDIA AI Enterprise 4.0 brings more frameworks and tools to advance AI development. NVIDIA Modulus is a framework for building, training, and fine-tuning physics-machine learning models with a simple Python interface. 

Using Modulus, users can bolster engineering simulations with AI and build models for enterprise-scale digital twin applications across multiple physics domains, from CFD and Structural to Electromagnetics. The ​​Deep Graph Library container is designed to implement and train Graph Neural Networks that can help scientists research the graph structure of molecules or financial services to detect fraud. 

Lastly, three exclusive pretrained foundation models, part of NVIDIA TAO, speed time to production for industry applications such as vision AI, defect detection, and retail loss prevention. 

NVIDIA AI Enterprise 4.0 is the most comprehensive upgrade to the platform to date. With enterprise-grade security, stability, manageability, and support, enterprises can expect reliable AI uptime and uninterrupted AI excellence.

Get started with NVIDIA AI Enterprise

Three ways to get accelerated with NVIDIA AI Enterprise:

  • Sign up for NVIDIA LaunchPad for ‌short-term access to sets of hands-on labs.
  • Sign up for a free 90-day evaluation for existing on-prem or cloud infrastructure.

Purchase through NVIDIA Partner Network or major Cloud Service Providers including AWS, Microsoft Azure, and Google Cloud.

Categories
Offsites

World scale inverse reinforcement learning in Google Maps

Routing in Google Maps remains one of our most helpful and frequently used features. Determining the best route from A to B requires making complex trade-offs between factors including the estimated time of arrival (ETA), tolls, directness, surface conditions (e.g., paved, unpaved roads), and user preferences, which vary across transportation mode and local geography. Often, the most natural visibility we have into travelers’ preferences is by analyzing real-world travel patterns.

Learning preferences from observed sequential decision making behavior is a classic application of inverse reinforcement learning (IRL). Given a Markov decision process (MDP) — a formalization of the road network — and a set of demonstration trajectories (the traveled routes), the goal of IRL is to recover the users’ latent reward function. Although past research has created increasingly general IRL solutions, these have not been successfully scaled to world-sized MDPs. Scaling IRL algorithms is challenging because they typically require solving an RL subroutine at every update step. At first glance, even attempting to fit a world-scale MDP into memory to compute a single gradient step appears infeasible due to the large number of road segments and limited high bandwidth memory. When applying IRL to routing, one needs to consider all reasonable routes between each demonstration’s origin and destination. This implies that any attempt to break the world-scale MDP into smaller components cannot consider components smaller than a metropolitan area.

To this end, in “Massively Scalable Inverse Reinforcement Learning in Google Maps“, we share the result of a multi-year collaboration among Google Research, Maps, and Google DeepMind to surpass this IRL scalability limitation. We revisit classic algorithms in this space, and introduce advances in graph compression and parallelization, along with a new IRL algorithm called Receding Horizon Inverse Planning (RHIP) that provides fine-grained control over performance trade-offs. The final RHIP policy achieves a 16–24% relative improvement in global route match rate, i.e., the percentage of de-identified traveled routes that exactly match the suggested route in Google Maps. To the best of our knowledge, this represents the largest instance of IRL in a real world setting to date.

Google Maps improvements in route match rate relative to the existing baseline, when using the RHIP inverse reinforcement learning policy.

The benefits of IRL

A subtle but crucial detail about the routing problem is that it is goal conditioned, meaning that every destination state induces a slightly different MDP (specifically, the destination is a terminal, zero-reward state). IRL approaches are well suited for these types of problems because the learned reward function transfers across MDPs, and only the destination state is modified. This is in contrast to approaches that directly learn a policy, which typically require an extra factor of S parameters, where S is the number of MDP states.

Once the reward function is learned via IRL, we take advantage of a powerful inference-time trick. First, we evaluate the entire graph’s rewards once in an offline batch setting. This computation is performed entirely on servers without access to individual trips, and operates only over batches of road segments in the graph. Then, we save the results to an in-memory database and use a fast online graph search algorithm to find the highest reward path for routing requests between any origin and destination. This circumvents the need to perform online inference of a deeply parameterized model or policy, and vastly improves serving costs and latency.

Reward model deployment using batch inference and fast online planners.

Receding Horizon Inverse Planning

To scale IRL to the world MDP, we compress the graph and shard the global MDP using a sparse Mixture of Experts (MoE) based on geographic regions. We then apply classic IRL algorithms to solve the local MDPs, estimate the loss, and send gradients back to the MoE. The worldwide reward graph is computed by decompressing the final MoE reward model. To provide more control over performance characteristics, we introduce a new generalized IRL algorithm called Receding Horizon Inverse Planning (RHIP).

IRL reward model training using MoE parallelization, graph compression, and RHIP.

RHIP is inspired by people’s tendency to perform extensive local planning (“What am I doing for the next hour?”) and approximate long-term planning (“What will my life look like in 5 years?”). To take advantage of this insight, RHIP uses robust yet expensive stochastic policies in the local region surrounding the demonstration path, and switches to cheaper deterministic planners beyond some horizon. Adjusting the horizon H allows controlling computational costs, and often allows the discovery of the performance sweet spot. Interestingly, RHIP generalizes many classic IRL algorithms and provides the novel insight that they can be viewed along a stochastic vs. deterministic spectrum (specifically, for H=∞ it reduces to MaxEnt, for H=1 it reduces to BIRL, and for H=0 it reduces to MMP).

Given a demonstration from so to sd, (1) RHIP follows a robust yet expensive stochastic policy in the local region surrounding the demonstration (blue region). (2) Beyond some horizon H, RHIP switches to following a cheaper deterministic planner (red lines). Adjusting the horizon enables fine-grained control over performance and computational costs.

Routing wins

The RHIP policy provides a 15.9% and 24.1% lift in global route match rate for driving and two-wheelers (e.g., scooters, motorcycles, mopeds) relative to the well-tuned Maps baseline, respectively. We’re especially excited about the benefits to more sustainable transportation modes, where factors beyond journey time play a substantial role. By tuning RHIP’s horizon H, we’re able to achieve a policy that is both more accurate than all other IRL policies and 70% faster than MaxEnt.

Our 360M parameter reward model provides intuitive wins for Google Maps users in live A/B experiments. Examining road segments with a large absolute difference between the learned rewards and the baseline rewards can help improve certain Google Maps routes. For example:

Nottingham, UK. The preferred route (blue) was previously marked as private property due to the presence of a large gate, which indicated to our systems that the road may be closed at times and would not be ideal for drivers. As a result, Google Maps routed drivers through a longer, alternate detour instead (red). However, because real-world driving patterns showed that users regularly take the preferred route without an issue (as the gate is almost never closed), IRL now learns to route drivers along the preferred route by placing a large positive reward on this road segment.

Conclusion

Increasing performance via increased scale – both in terms of dataset size and model complexity – has proven to be a persistent trend in machine learning. Similar gains for inverse reinforcement learning problems have historically remained elusive, largely due to the challenges with handling practically sized MDPs. By introducing scalability advancements to classic IRL algorithms, we’re now able to train reward models on problems with hundreds of millions of states, demonstration trajectories, and model parameters, respectively. To the best of our knowledge, this is the largest instance of IRL in a real-world setting to date. See the paper to learn more about this work.

Acknowledgements

This work is a collaboration across multiple teams at Google. Contributors to the project include Matthew Abueg, Oliver Lange, Matt Deeds, Jason Trader, Denali Molitor, Markus Wulfmeier, Shawn O’Banion, Ryan Epp, Renaud Hartert, Rui Song, Thomas Sharp, Rémi Robert, Zoltan Szego, Beth Luan, Brit Larabee and Agnieszka Madurska.

We’d also like to extend our thanks to Arno Eigenwillig, Jacob Moorman, Jonathan Spencer, Remi Munos, Michael Bloesch and Arun Ahuja for valuable discussions and suggestions.

Categories
Misc

NVIDIA Lends Support to Washington’s Efforts to Ensure AI Safety

In an event at the White House today, NVIDIA announced support for voluntary commitments that the Biden Administration developed to ensure advanced AI systems are safe, secure and trustworthy. The news came the same day NVIDIA’s chief scientist, Bill Dally, testified before a U.S. Senate subcommittee seeking input on potential legislation covering generative AI. Separately, Read article >

Categories
Misc

Mobility Gets Amped: IAA Show Floor Energized by Surge in EV Reveals, Generative AI

Generative AI’s transformative effect on the auto industry took center stage last week at the International Motor Show Germany, known as IAA, in Munich. NVIDIA’s Danny Shapiro, VP of automotive marketing, explained in his IAA keynote how this driving force is accelerating innovation and streamlining processes — from advancing design, engineering and digital-twin deployment for Read article >

Categories
Misc

Generative AI and Accelerated Computing for Spear Phishing Detection

Spear phishing is the largest and most costly form of cyber threat, with an estimated 300,000 reported victims in 2021 representing $44 million in reported…

Spear phishing is the largest and most costly form of cyber threat, with an estimated 300,000 reported victims in 2021 representing $44 million in reported losses in the United States alone. Business e-mail compromises led to $2.4 billion in costs in 2021, according to the FBI Internet Crime Report. In the period from June 2016 to December 2021, costs related to phishing and spear phishing totaled $43 billion for businesses, according to IBM Security Cost of a Data Breach

Spear phishing e-mails are indistinguishable from a benign e-mail that a victim would receive. This is also why traditional classification of spear phishing e-mails is so difficult. The content difference between a scam and a legitimate e-mail can be minuscule. Often, the only difference between the two is the intent of the sender: is the invoice legitimate, or is it a scam? 

This post details a two-fold approach to improve spear phishing detection by boosting the signals of intent using NVIDIA Morpheus to run data processing and inferencing.

Generating e-mails with new phishing intent

The first step involves using generative AI to create large, varied corpora of e-mails with various intents associated with spear phishing and scams. As new threats emerge, the NVIDIA Morpheus team uses the NVIDIA NeMo framework to generate a new corpus of e-mails with such threats. Following the generation of new e-mails with the new type of phishing intent, the team trains a new language model to recognize the intent. In traditional phishing detection mechanisms, such models would require a significant number of human-labeled e-mails.

Diagram showing an overview of the spear phishing detection methodology. AI- generated e-mails with specific intents are used to train intent models that label incoming user e-mails. These labels are joined with past sender behavior (if any) and e-mail metadata to classify the e-mail as spear phishing or not.
Figure 1. Overview of the spear phishing detection methodology

Detecting sender intent

The first step targets the intent behind the e-mail. The next step targets the intent of the sender. To defend against spear phishing attacks that use spoofing, known senders, or longer cons that do not express their true intent immediately, we construct additional signals by building up behavioral sketches from senders or groups of senders. 

Building on the intent work described above, known senders’ past observed intents are recorded. For example, the first time a known sender asks for money can be a signal to alert the user. 

Syntax usage is also observed and recorded. The syntax of new e-mails is compared to the syntax history of the sender. A deviation from the observed syntax could indicate a possible spoofing attack. 

Finally, the temporal patterns of a sender’s e-mails are collected and cross-referenced when a new e-mail arrives to check for out-of-pattern behavior. Is the sender sending an e-mail for the first time at midnight on a Saturday? If so, that becomes a signal in the final prediction. These signals in aggregate are used to classify e-mails. They are also presented to the end user as an explanation for why an e-mail may be malicious.

Adapting to new attacks and improving protection

Existing machine learning (ML) methods rely nearly entirely on human-labeled data and cannot adapt to emerging threats quickly. The biggest benefit to detecting spear phishing e-mails using the approach presented here is how quickly the model can be adapted to new attacks. When a new attack emerges, generative AI is leveraged to create a training corpus for the attack. Intent models are trained to detect its presence in received e-mails. 

Using models built with NeMo generates thousands of high-quality, on-topic e-mails in just a few hours. The new intents are added to the existing spear phishing detector. The entire end-to-end workflow of creating new phishing attack e-mails and updating the existing models happens in less than 24 hours. Once the models are in place, ‌e-mail processing and inferencing become a Morpheus pipeline to provide near real-time protection against spear phishing threats.

Results

To illustrate the flexibility of this approach, a model was trained using only money, banking, and personal identifying information (PII) intents. Next, cryptocurrency-flavored phishing e-mails were generated using models built with NeMo. These e-mails were incorporated into the original training and validation subsets. 

The validation set, now containing the new crypto attacks, was then passed into the original model. Then a second model was trained incorporating the crypto attack intents. Figure 2 shows how the models compare in their detection. 

After training for the attack, the F1 score increased from 0.54 to 0.89 (Figure 3). This illustrates how quickly new attacks can be trained for and adapted to using NVIDIA Morpheus and NeMo.

Chart showing how the models compare in their detection.
Figure 2. Differences in detection between an untrained model and the model trained for a cryptocurrency-based spear phishing attack
Chart showing the F1-score difference between the models
Figure 3. F1-score difference between an untrained model and the model trained for a cryptocurrency-based spear phishing attack

Get started with NVIDIA Morpheus

Watch the video, Improve Spear Phishing Detection with Generative AI for more details. Learn more about how to use NVIDIA Morpheus to detect spear phishing e-mails faster and with greater accuracy using the NVIDIA AI workflow example. You can also apply to try NVIDIA Morpheus in LaunchPad and request a 90-day free trial to test drive NVIDIA Morpheus, part of the NVIDIA AI Enterprise software family.

Categories
Misc

Event: RecSys at Work: Best Practices and Insights

An illustration showing different scenes with recommender systems in action.On Sept. 27, join us to learn recommender systems best practices for building, training, and deploying at any scale.An illustration showing different scenes with recommender systems in action.

On Sept. 27, join us to learn recommender systems best practices for building, training, and deploying at any scale.

Categories
Misc

A Quantum Boost: cuQuantum With PennyLane Lets Simulations Ride Supercomputers

Ten miles in from Long Island’s Atlantic coast, Shinjae Yoo is revving his engine. The computational scientist and machine learning group lead at the U.S. Department of Energy’s Brookhaven National Laboratory is one of many researchers gearing up to run quantum computing simulations on a supercomputer for the first time, thanks to new software. Yoo’s Read article >

Categories
Misc

Selecting the Right Camera for the NVIDIA Jetson and Other Embedded Systems

The camera module is the most integral part of an AI-based embedded system. With so many camera module choices on the market, the selection process may seem…

The camera module is the most integral part of an AI-based embedded system. With so many camera module choices on the market, the selection process may seem overwhelming. This post breaks down the process to help make the right selection for an embedded application, including the NVIDIA Jetson.

Camera selection considerations

Camera module selection involves consideration of three key aspects: sensor, interface (connector), and optics. 

Sensor 

The two main types of electronic image sensors are the charge-coupled device (CCD) and the active-pixel sensor (CMOS). For a CCD sensor, pixel values can only be read on a per-row basis. Each row of pixels is shifted, one by one, into a readout register. For a CMOS sensor, each pixel can be read individually and in parallel. 

CMOS is less expensive and consumes less energy without sacrificing image quality, in most cases. It can also achieve higher frame rates due to the parallel readout of pixel values. However, there are some specific scenarios in which CCD sensors still prevail—for example, when long exposure is necessary and very low-noise images are required, such as in astronomy. 

Electronic shutter 

There are two options for the electronic shutter: global or rolling. A global shutter exposes each pixel to incoming light at the same time. A rolling shutter exposes the pixel rows in a certain order (top to bottom, for example) and can cause distortion (Figure 1).

Two images of a helicopter showing distortion of moving blades caused by rolling shutter.
Figure 1. Distortion of rotor blades caused by rolling shutter

The global shutter is not impacted by motion blur and distortion due to object movement. It is much easier to sync multiple cameras with a global shutter because there is a single point in time when exposure starts. However, sensors with a global shutter are much more expensive than those with a rolling shutter. 

Color or monochrome 

In most cases, a monochrome image sensor is sufficient for typical machine vision tasks like fault detection, presence monitoring, and recording measurements.

With a monochrome sensor, each pixel is usually described by eight bits. With a color sensor, each pixel has eight bits for the red channel, eight bits for the green channel, and eight bits for the blue channel. The color sensor requires processing three times the amount of data, resulting in a higher processing time and, consequently, a slower frame rate.  

Dynamic range 

Dynamic range is the ratio between the maximum and minimum signal that is acquired by the sensor. At the upper limit, pixels appear white for higher values of intensity (saturation), while pixels appear black at the lower limit and below. An HDR of at least 80db is needed for indoor application and up to 140db is needed for outdoor application. 

Resolution 

Resolution is a sensor’s ability to reproduce object details. It can be influenced by factors such as the type of lighting used, the sensor pixel size, and the capabilities of the optics. The smaller the object detail, the higher the required resolution. 

Pixel resolution translates to how many millimeters each pixel is equal to on the image. The higher the resolution, the sharper your image will be. The camera or sensor’s resolution should enable coverage of a feature’s area of at least two pixels. 

CMOS sensors with high resolutions tend to have low frame rates. While a sensor may achieve the resolution you need, it will not capture the quality images you need without achieving enough frames per second. It is important to evaluate the speed of the sensor. 

A general rule of thumb to determine the resolution needed for the use case is shown below and in Figure 2.  The multiplier (2) represents the typical desire to have a minimum two pixels on an object in order to successfully detect it.

Resolution = 2times frac{Field  of  View (FOV)}{Size  of  feature  of  interest}

Diagram showing the representation of a person and the working distance from an object as an example of minimum object feature size of interest in the field of view.
Figure 2. Sensor resolution required is determined by lens field of view and feature of interest size

For example, suppose you have an image of an injury around the eye of a boxer. 

  • Resolution= 2times frac{2000}{4}
  • FOV, mm = 2000mm 
  • Size of feature of interest (the eye), mm = 4mm

Based on the calculation, 1000 x 1000, a one-megapixel camera should be sufficient to detect the eye using a CV or AI algorithm. 

Note that a sensor is made up of multiple rows of pixels. These pixels are also called photosites. The number of photons collected by a pixel is directly proportional to the size of the pixel. Selecting a larger pixel may seem tempting but may not be the optimal choice in all the cases. 

Small pixel  Sensitive to noise (-)  Higher spatial resolution for same sensor size (+) 
Large pixel  Less sensitive to noise (+)  Less spatial resolution for same sensor size (-) 
Table 1.  Pros and cons of small and large pixel size

Back-illuminated sensors maximize the amount of light being captured and converted by each photodiode. In front-illuminated sensors, metal wiring above the photodiodes blocks off some photons, hence reducing the amount of light captured.

On the left, a diagram of a front-illuminated structure with substrate, photodiodes, metal wiring, and microlenses. On the right, a diagram of a back-illuminated structure with metal wiring, photodiodes, and microlenses.
Figure 3. Cross-section of a front-illuminated structure (left) and a back-illuminated structure (right)

Frame rate and shutter speed 

The frame rate refers to the number of frames (or images captured) per second (FPS). The frame rate should be determined based on the number of inspections required per second. This correlates with the shutter speed (or exposure time), which is the time that the camera sensor is exposed to capture the image. 

Theoretically, the maximum frame rate is equal to the inverse of the exposure time. But achievable FPS is lower because of latency introduced by frame readout, sensor resolution, and the data transfer rate of the interface including cabling. 

FPS can be increased by reducing the need for large exposure times by adding additional lighting, binning the pixels. 

CMOS sensors can achieve higher FPS, as the process of reading out each pixel can be done more quickly than with the charge transfer in a CCD sensor’s shift register. 

Interface

There are multiple ways to connect the camera module to an embedded system. Typically, for evaluation purposes, cameras with USB and Ethernet interfaces are used because custom driver development is not needed. 

Other important parameters for interface selection are transmission length, data rate, and operating conditions. Table 2 lists the most popular interfaces. Each option has its pros and cons. 

Features  USB 3.2  Ethernet (1 GbE)  MIPI CSI-2  GMSL2  FPDLINK III 
Bandwidth  10Gbps  1Gbps  DPHY 2.5 Gbps/lane CPHY 5.71 Gbps/lane  6Gbps  4.2Gbps 
Cable length supported  Up to 100m 
Plug-and-play  Supported  Supported  Not supported  Not supported  Not supported 
Development costs  Low  Low  Medium to high  Medium to high  Medium to high 
Operating environment  Indoor  Indoor  Indoor  Indoor and outdoor  Indoor and outdoor 
Table 2. Comparison of various camera interfaces

Optics 

The basic purpose of an optical lens is to collect the light scattered by an object and recreate an image of the object on a light-sensitive image sensor (CCD or CMOS). The following factors should be considered when selecting an optimized lens-focal length, sensor format, field of view, aperture, chief ray angle, resolving power, and distortion. 

Lenses are manufactured with a limited number of standard focal lengths. Common lens focal lengths include 6mm, 8mm, 12.5mm, 25mm, and 50mm. 

Once you choose a lens with a focal length closest to the focal length required by your imaging system, you need to adjust the working distance to get the object under inspection in focus. Lenses with short focal lengths (less than 12mm) produce images with a significant amount of distortion. 

If your application is sensitive to image distortion, try to increase the working distance and use a lens with a higher focal length. If you cannot change the working distance, you are somewhat limited in choosing an optimized lens. 

  Wide-angle lens  Normal lens  Telephoto lens 
Focal length  50mm >=70mm 
Use case  Nearby scenes  Same as human eye  Far-away scenes 
Table 3. Main types of camera lenses

To attach a lens to a camera requires some type of mounting system. Both mechanical stability (a loose lens will deliver an out-of-focus image) and the distance to the sensor must be defined. 

To ensure compatibility between different lenses and cameras, the following standard lens mounts are defined. 

  Most popular For industrial applications
Lens mount M12/S mount C-mount
Flange focal length Non-standard 17.526mm
Threads (per mm) 0.5  0.75 
Sensor size accommodated (inches) Up to ⅔ Up to 1
Table 4. Common lens mounts used in embedded space

NVIDIA camera module partners 

NVIDIA maintains a rich ecosystem of partnerships with highly competent camera module makers all over the world. See Jetson Partner Supported Cameras for details. These partners can help you design imaging systems for your application from concept to production for the NVIDIA Jetson

Graphic showing NVIDIA Jetson with camera modules for various use cases and industries.
Figure 4. NVIDIA Jetson in combination with camera modules can be used across industries for various needs

Summary

This post has explained the most important camera characteristics to consider when selecting a camera for an embedded application. Although the selection process may seem daunting, the first step is to understand your key constraints based on design, performance, environment, and cost. 

Once you understand the constraints, then focus on the characteristics most relevant to your use case. For example, if the camera will be deployed away from the compute or in a rugged environment, consider using the GMSL interface. If the camera will be used in low-light conditions, consider a camera module with larger pixel and sensor sizes. If the camera will be used in a motion application, consider using a camera with a global shutter. 

To learn more, watch Optimize Your Edge Application: Unveiling the Right Combination of Jetson Processors and Cameras. For detailed specs on AI performance, GPU, CPU, and more for both Xavier and Orin-based Jetson modules, visit Jetson Modules