Categories
Offsites

Presenting the iGibson Challenge on Interactive and Social Navigation

Computer vision has significantly advanced over the past decade thanks to large-scale benchmarks, such as ImageNet for image classification or COCO for object detection, which provide vast datasets and criteria for evaluating models. However, these traditional benchmarks evaluate passive tasks in which the emphasis is on perception alone, whereas more recent computer vision research has tackled active tasks, which require both perception and action (often called “embodied AI”).

The First Embodied AI Workshop, co-organized by Google at CVPR 2020, hosted several benchmark challenges for active tasks, including the Stanford and Google organized Sim2Real Challenge with iGibson, which provided a real-world setup to test navigation policies trained in photo-realistic simulation environments. An open-source setup in the challenge enabled the community to train policies in simulation, which could then be run in repeatable real world navigation experiments, enabling the evaluation of the “sim-to-real gap” — the difference between simulation and the real world. Many research teams submitted solutions during the pandemic, which were run safely by challenge organizers on real robots, with winners presenting their results virtually at the workshop.

This year, Stanford and Google are proud to announce a new version of the iGibson Challenge on Interactive and Social Navigation, one of the 10 active visual challenges affiliated with the Second Embodied AI Workshop at CVPR 2021. This year’s Embodied AI Workshop is co-organized by Google and nine other research organizations, and explores issues such as simulation, sim-to-real transfer, visual navigation, semantic mapping and change detection, object rearrangement and restoration, auditory navigation, and following instructions for navigation and interaction tasks. In addition, this year’s interactive and social iGibson challenge explores interactive navigation and social navigation — how robots can learn to interact with people and objects in their environments — by combining the iGibson simulator, the Google Scanned Objects Dataset, and simulated pedestrians within realistic human environments.

New Challenges in Navigation
Active perception tasks are challenging, as they require both perception and actions in response. For example, point navigation involves navigating through mapped space, such as driving robots over kilometers in human-friendly buildings, while recognizing and avoiding obstacles. Similarly object navigation involves looking for objects in buildings, requiring domain invariant representations and object search behaviors. Additionally, visual language instruction navigation involves navigating through buildings based on visual images and commands in natural language. These problems become even harder in a real-world environment, where robots must be able to handle a variety of physical and social interactions that are much more dynamic and challenging to solve. In this year’s iGibson Challenge, we focus on two of those settings:

  • Interactive Navigation: In a cluttered environment, an agent navigating to a goal must physically interact with objects to succeed. For example, an agent should recognize that a shoe can be pushed aside, but that an end table should not be moved and a sofa cannot be moved.
  • Social Navigation: In a crowded environment in which people are also moving about, an agent navigating to a goal must move politely around the people present with as little disruption as possible.

New Features of the iGibson 2021 Dataset
To facilitate research into techniques that address these problems, the iGibson Challenge 2021 dataset provides simulated interactive scenes for training. The dataset includes eight fully interactive scenes derived from real-world apartments, and another seven scenes held back for testing and evaluation.

iGibson provides eight fully interactive scenes derived from real-world apartments.

To enable interactive navigation, these scenes are populated with small objects drawn from the Google Scanned Objects Dataset, a dataset of common household objects scanned in 3D for use in robot simulation and computer vision research, licensed under a Creative Commons license to give researchers the freedom to use them in their research.

The Google Scanned Objects Dataset contains 3D models of many common objects.

The challenge is implemented in Stanford’s open-source iGibson simulation platform, a fast, interactive, photorealistic robotic simulator with physics based on Bullet. For this year’s challenge, iGibson has been expanded with fully interactive environments and pedestrian behaviors based on the ORCA crowd simulation algorithm.

iGibson environments include ORCA crowd simulations and movable objects.

Participating in the Challenge
The iGibson Challenge has launched and its leaderboard is open in the Dev phase, in which participants are encouraged to submit robotic control to the development leaderboard, where they will be tested on the Interactive and Social Navigation challenges on our holdout dataset. The Test phase opens for teams to submit final solutions on May 16th and closes on May 31st, with the winner demo scheduled for June 20th, 2021. For more details on participating, please check out the iGibson Challenge Page.

Acknowledgements
We’d like to thank our colleagues at at the Stanford Vision and Learning Lab (SVL) for working with us to advance the state of interactive and social robot navigation, including Chengshu Li, Claudia Pérez D’Arpino, Fei Xia, Jaewoo Jang, Roberto Martin-Martin and Silvio Savarese. At Google, we would like to thank Aleksandra Faust, Anelia Angelova, Carolina Parada, Edward Lee, Jie Tan, Krista Reyman and the rest of our collaborators on mobile robotics. We would also like to thank our co-organizers on the Embodied AI Workshop, including AI2, Facebook, Georgia Tech, Intel, MIT, SFU, Stanford, UC Berkeley, and University of Washington.

Categories
Misc

Accelerated Portfolios: NVIDIA Inception VC Alliance Connects Top Investors with Leading AI Startups

To better connect venture capitalists with NVIDIA and promising AI startups, we’ve introduced the NVIDIA Inception VC Alliance. This initiative, which VCs can apply to now, aims to fast-track the growth for thousands of AI startups around the globe by serving as a critical nexus between the two communities. AI adoption is growing across industries Read article >

The post Accelerated Portfolios: NVIDIA Inception VC Alliance Connects Top Investors with Leading AI Startups appeared first on The Official NVIDIA Blog.

Categories
Misc

Durham University and DiRAC’s New NVIDIA InfiniBand-Powered Supercomputer to Accelerate Our Understanding of the Universe

NVIDIA today announced that Durham University’s new COSMA-8 supercomputer — to be used by world-leading cosmologists in the UK to research the origins of the universe — will be accelerated by NVIDIA® HDR InfiniBand networking.

Categories
Misc

Cloud-Native Supercomputing Is Here: So, What’s a Cloud-Native Supercomputer?

Cloud-native supercomputing is the next big thing in supercomputing, and it’s here today, ready to tackle the toughest HPC and AI workloads. The University of Cambridge is building a cloud-native supercomputer in the UK. Two teams of researchers in the U.S. are separately developing key software elements for cloud-native supercomputing. The Los Alamos National Laboratory, Read article >

The post Cloud-Native Supercomputing Is Here: So, What’s a Cloud-Native Supercomputer? appeared first on The Official NVIDIA Blog.

Categories
Misc

NVIDIA Accelerates World’s First TOP500 Academic Cloud-Native Supercomputer to Advance Research at Cambridge University

Scientific discovery powered by supercomputing has the potential to transform the world with research that benefits science, industry and society. A new open, cloud-native supercomputer at Cambridge University offers unrivaled performance that will enable researchers to pursue exploration like never before. The Cambridge Service for Data Driven Discovery, or CSD3 for short, is a UK Read article >

The post NVIDIA Accelerates World’s First TOP500 Academic Cloud-Native Supercomputer to Advance Research at Cambridge University appeared first on The Official NVIDIA Blog.

Categories
Misc

NVIDIA DLSS Natively Supported in Unity 2021.2

Unity made real-time ray tracing available to all of their developers in 2019 with the release of 2019LTS. Before the end of 2021, NVIDIA DLSS will be natively supported for HDRP in Unity 2021.2.

AI Super Resolution Tech Available Later This Year

Unity made real-time ray tracing available to all of their developers in 2019 with the release of 2019LTS. Before the end of 2021, NVIDIA DLSS (Deep Learning Super Sampling) will be natively supported for HDRP in Unity 2021.2. NVIDIA DLSS uses advanced AI rendering to produce image quality that’s comparable to native resolution–and sometimes even better–while only conventionally rendering a fraction of the pixels. With real-time ray tracing and NVIDIA DLSS, Unity developers will be able to create beautiful real-time ray traced worlds running at high frame rates and resolutions on NVIDIA RTX GPUs. DLSS  also provides a substantial performance boost for traditional rasterized graphics.  

While ray tracing produces far more realistic images than rasterization, it also requires a lot more computation, which then leads to lower framerates. NVIDIA’s solution is to ray trace fewer pixels and use AI on our dedicated Tensor Core units to intelligently scale up to a higher resolution, and while doing so, significantly boost framerates. We built a supercomputer to train the DLSS deep neural net with extremely high quality 16K offline rendered images of many kinds of content. Once trained, the model can be integrated into the core DLSS library, the game itself or even downloaded by NVIDIA’s Game Ready driver. 

At runtime, DLSS takes three inputs: 1) a low-resolution, aliased image 2) motion vectors for the current frame, and 3) the high resolution previous frame. From those inputs, DLSS composes a beautifully sharp high-resolution image, to which post-processing and UI/HUD is then applied. You get the performance headroom you need to maximize ray tracing settings and increase output resolution.

At GTC 2021, Light Brick Studio demonstrated how stunning Unity games can look when real-time ray tracing and DLSS are combined. Watch their full talk for free here

Keep an eye out for more news about DLSS in Unity 2021.2 by subscribing to NVIDIA game development news and following the Unity Technologies Blog.

Categories
Misc

Leveling Up Graphics and Performance with RTX, DLSS and Reflex at NVIDIA GTC

We are excited to share over a dozen new and updated developer tools released today at GTC for game developers, including NVIDIA Reflex, RTXDI, and our new RTX Technology Showcase.

SDK Updates For Game Developers and Digital Artists

GTC is a great opportunity to get hands-on with NVIDIA’s latest graphics technologies. Developers can apply now for access to RTX Direct Illumination (RTXDI), the latest advancement in real-time ray tracing. Nsight Perf, the next in a line of developer optimization tools, has just been made available to all members of the NVIDIA Developer Program. In addition, several exciting updates to aid game development and professional visualization were announced for existing SDKs.

REAL TIME RAY TRACING MADE EASIER

RTX Direct Illumination (RTXDI)

Imagine adding millions of dynamic lights to your game environments without worrying about performance or resource constraints. RTXDI makes this possible while rendering in real time. 
Geometry of any shape can now emit light and cast appropriate shadows: Tiny LEDs. Times Square billboards. Even exploding fireballs. RTXDI easily incorporates lighting from user-generated models. And all of these lights can move freely and dynamically.

In this scene, you can see neon signs, brake lights, apartment windows, store displays, and wet roads all acting as independent light sources. All of that can now be captured in real-time with RTXDI.

RTXDI removes the limits on the amount of lights artists can put in a scene. Artists no longer have to cheat, or make painful decisions about which lights matter, and which ones don’t. They can light scenes completely unconstrained by anything but their creative vision. Developers can apply for access to RTXDI here.

RTX Global Illumination (RTXGI)


Leveraging the power of ray tracing, the RTX Global Illumination (RTXGI) SDK provides scalable solutions to compute multi-bounce indirect lighting without bake times, light leaks, or expensive per-frame costs. Version 1.1.30 allows developers to enable, disable, and rotate individual DDGI volumes. The RTXGI plugin comes pre-installed on the latest version of NVRTX, which can be found here. Developers can apply for general access to RTXGI here.

NVIDIA Real Time Denoiser (NRD)

NRD is a spatio-temporal API-agnostic denoising library that’s designed to work with low ray-per-pixel signals. In version 2.0, a high frequency denoiser (called ReLAX) has been added to support RTXDI signals. Split screen view support is included for denoised image comparisons, dynamic flow control is accessible, and checkerboard support for ReLAX and shadow denoisers have been included. Developers can apply for access here.

NVIDIA RTX Unreal Engine Branch (NvRTX)

NvRTX is a custom UE4 branch for NVIDIA technologies on GitHub. Having custom UE4 branches on GitHub shortens the development cycle, and helps make games look more stunning. NvRTX 4.26.1 includes RTX Direct Illumination with ReLAX Denoiser in preview and RTX Global Illumination. This branch is the only place to get all NVIDIA RTX technology in one place. NvRTX also includes an application for developers to experience and play with the latest RTX technology that will continue to be updated in the future. Try it for yourself here.

IMPROVING FRAME RATES AND RESPONSIVENESS INSTANTLY

Deep Learning Super Sampling (DLSS)

NVIDIA DLSS is a new and improved deep learning neural network that boosts frame rates and generates beautiful, sharp images for your games. It gives you the performance headroom to maximize ray tracing settings and increase output resolution. DLSS is powered by dedicated AI processors on RTX GPUs called Tensor Cores. It is now available as a plugin for Unreal Engine 4.26; the latest version can be found at NVIDIA Developer or Unreal Marketplace. 

Unity has announced that DLSS will be natively supported in Unity Engine version 2021.2 later this year. Learn more here.

Reflex

Reflex SDK allows developers to implement a low latency mode that aligns game engine work to complete just-in-time for rendering, eliminating GPU render queue and reducing CPU back pressure. Reflex 1.4 introduces a new boost feature that further reduces latency when a game becomes CPU render thread bound. In addition, the flash indicator was added to the Unity Plugin, making it easier to begin measuring latency.

Nsight Perf SDK

Nsight Perf is a graphics profiling toolbox for DirectX, Vulkan, and OpenGL, enabling you to collect GPU performance metrics directly from your application. Profile while you’re in-application, upgrade your CI/CD, and be one with the GPU. 

Nsight Perf 2021.1 is available now here

Nsight Graphics

Nsight Graphics is a standalone developer tool that enables you to debug, profile, and export frames built with DirectX12, Vulkan, OpenGL, and OpenVR. In version 2021.2, we’re introducing Trace Analysis; a powerful new GPU Trace feature that provides developers detailed information on where in your frame you should focus on in order to improve your application’s performance. In addition to Trace Analysis, GPU trace can now show sample values in addition to percentages. We’ve also improved window docking to provide more ways for you to configure them (especially in multi-monitor setups). For captures, you can now specify which swap chain you want to use, making the Nsight Graphics easier to use on applications that have multiple windows/swap chains (such as level editors). 

All of the powerful debugging and profiling features in Nsight Graphics are available for realtime ray tracing, which includes support for DXR and Vulkan Ray Tracing. Watch this short video to see how you can leverage Nsight Graphics to improve your developer productivity and ensure that your game is fast and visually breathtaking.

Nsight Systems

Nsight Systems is a system-wide performance analysis tool designed to visualize an application’s algorithms, help you identify the largest opportunities to optimize, and tune to scale efficiently across any quantity or size of CPUs and GPUs, from large servers to our smallest SoC. Version 2021.2 includes CUDA UVM CPU & GPU Page faults, Reflex SDK trace and GPU Metrics Sampling providing a system wide overview of efficiency for your GPU workloads. This expands Nsight Systems ability to profile system-wide activity and help track GPU workloads and their CPU origins. By providing a deeper understanding of the GPU utilization over multiple processes and contexts; covering the interop of Graphics and Compute workloads including CUDA, OptiX, DirectX and Vulkan ray tracing + rasterization APIs.
Download the latest version here.

CREATING AND SIMULATING PHOTO-REALISTIC GRAPHICS 

OptiX

OptiX is an application framework for achieving optimal ray tracing performance on the GPU. It provides a simple, recursive, and flexible pipeline for accelerating ray tracing algorithms. Optix 7.3 enables object loading from disk, freeing up the GPU and making developers less reliant on the CPU. This update also brings improvements to denoising capabilities for objects in motion while improving the real time performance of Curves. Download OptiX today here.

Images courtesy: Zhelong Xu, Adobe Substance, LeeGriggs, Autodesk Arnold, Mondlicht Studios, Chaos V-Ray, Madis Epler, Otoy Octane, Oly Stingel, Redshift, Siemens Digital Industries Software

NanoVDB

NanoVDB adds real-time rendering GPU support for OpenVDB. OpenVDB is the Academy Award-winning industry standard data-structure and toolset used for manipulating volumetric effects. The latest version of NanoVDB offers a significant footprint reduction to the GPU memory, freeing up resources for other tasks.

 

Texture Tools Exporter

Version 2021.1.1 of the NVIDIA Texture Tools Exporter brings AI-powered NGX Image Super-Resolution, initial support for the KTX and KTX2 file formats including Zstandard supercompression, resizing and high-DPI windowing, and more. You can get access to the latest version here

Omniverse Audio2Face

NVIDIA Omniverse Audio2Face is now available in open beta. With the Audio2Face app, Omniverse users can generate AI-driven facial animation from audio sources. The beta release of Audio2Face includes the highly anticipated ‘character transfer’ feature, enabling users to retarget animation onto a custom 3D facial mesh.

Reallusion Character Creator Connector

With Character Creator 3 and Omniverse, individuals or design teams can create and deploy digital characters as task performers, virtual hosts, or citizens for simulations and visualizations.

The Connector adds the power of a full character generation system with motions and unlimited creative variations to Omniverse:

  • Character Creator digital humans can be chosen from a library or custom creation can begin with highly morphable, fully-rigged bases allowing creators of all skill levels a way to easily design characters from scratch.
  • Character Creator Headshot, SkinGen, and Smart Hair all allow for detailed character definition from head to toe.
  • Omniverse users can transfer characters and motions from Character Creator with the Omniverse Exporter for a solution that is easy to learn and deploy digital humans for Omniverse Create and Omniverse Machinima.

This new Connector adds a complete digital human creation pipeline to any Omniverse-based application.

You can also find a lot more information about Reallusion applications and how they work with Omniverse on their website here: Creating Animated Digital Humans for Omniverse | Reallusion Character Creator & Nvidia Omniverse

Omniverse Connectors

This GTC, we unveiled a series of new Omniverse Connectors – plugins to third-party applications – including Autodesk 3ds Max, GRAPHISOFT Archicad, Autodesk Maya, Adobe Photoshop, Autodesk Revit, McNeel & Associates Rhino including Grasshopper, Trimble SketchUp, Substance Designer, Substance Painter, and Epic Games Unreal Engine 4. 

Alongside this, we have a boatload of new connectors in the works, and some of the ones that will be coming soon are Blender, Reallusion Character Creator 3, SideFX Houdini, Marvelous Designer, Autodesk Motionbuilder, Paraview, OnShape, DS SOLIDWORKS, Substance Source, and many, many more.

Explore Omniverse Open Beta today – download now. 

Register for free for our game development track at GTC this week.

You can also find a list of all of our game development SDKS here.

Categories
Misc

Try NVIDIA Game Development SDKs in the Interactive RTX Technology Showcase

Finding ways to improve performance and visual fidelity in your games and applications is challenging. To help during the game development process, NVIDIA has packaged and released a suite of SDKs through our branch of Unreal Engine for all developers, from independent to AAA, to harness the power of RTX. 

Executable and Project Files Available to Download For Game Developers and Digital Artists

Finding ways to improve performance and visual fidelity in your games and applications is challenging. To help during the game development process, NVIDIA has packaged and released a suite of SDKs through our branch of Unreal Engine for all developers, from independent to AAA, to harness the power of RTX. 

Today, NVIDIA released RTX Technology Showcase – an interactive demo built from NVIDIA’s RTX Unreal Engine 4.26 Branch (NvRTX). RTX Technology Showcase project files are also available for further guidance and discovery of the benefits that ray tracing and AI brings to your projects. This application allows you to toggle ray-traced reflections, ray-traced translucency, DLSS, RTX Direct Illumination, and RTX Global Illumination to visualize the difference in real-time. The ray tracing SDKs are available through NvRTX while DLSS is available as a UE 4.26 plugin.

  • RTX Direct Illumination  lets artists add millions of dynamic lights to game environments without worrying about performance or resource constraints, in real time.
  • RTX Global Illumination provides scalable solutions to compute multi-bounce indirect lighting without bake times, light leaks, or expensive per-frame costs.
  • NVIDIA Real-Time Denoiser is a spatio-temporal API agnostic denoising library that’s designed to work with low ray per pixel signals.
  • NVIDIA DLSS (Deep Learning Super-Sampling) taps into the power of a deep learning neural network to boost frame rates and generate beautiful, sharp images for your games.

Learn more and download the application and project files at developer.nvidia.com/rtx-technology-showcase. 

Check out our game development track at GTC 21 here.

See a full list of our game development announcements at GTC in this blog post. 

Categories
Misc

How to load data efficiently, so memory can be utilized

Currently I’m trying to load some images for training purpose, here is what i’m currently doing

sats = [np.array(Image.open(cdir + "/x/" + name).convert('RGB'),dtype="float32") for name in names] masks = [np.array(Image.open(cdir + "/y/" + name),dtype="float32") for name in names] 

But this takes almost all space in colab, when running on the full dataset. So my question is can I use a better api, which will partially load data, so I don’t run out of memory ?

Thanks.

submitted by /u/maifee
[visit reddit] [comments]

Categories
Misc

Accelerating Inference with NVIDIA Triton Inference Server and NVIDIA DALI

When you are working on optimizing inference scenarios for the best performance, you may underestimate the effect of data preprocessing. These are the operations required before forwarding an input sample through the model. This post highlights the impact of the data preprocessing on inference performance and how you can easily speed it up on the … Continued

When you are working on optimizing inference scenarios for the best performance, you may underestimate the effect of data preprocessing. These are the operations required before forwarding an input sample through the model. This post highlights the impact of the data preprocessing on inference performance and how you can easily speed it up on the GPU, using NVIDIA DALI and NVIDIA Triton Inference Server.

Does preprocessing matter?

Regardless of a particular model that you want to run inference on, some degree of data preprocessing is required. In computer vision applications, the input operations usually include decoding, resizing, and normalizing to a standardized format accepted by the neural network. Speech recognition models, on the other hand, may require calculating certain features, like a spectrogram, and some raw audio sample processing, such as pre-emphasis and dithering.

Often, preprocessing routines that you use for inference are similar to the ones used as an input pipeline for the model training. Implementing both using the same tools can save you some boilerplate and code repetition. Ensuring that the preprocessing operations used for inference are defined identically as they were when the model was trained is key to achieving high accuracy.

The more complicated a preprocessing pipeline is for a given model, the bigger fraction of the entire inference time that it takes. It means that accelerating only the network processing time does not yield proportional improvement in overall inference latency. This is especially true if you use CPU to prepare data before feeding it to the model.

What is NVIDIA DALI?

DALI is a data loading and preprocessing library to build highly optimized custom data processing pipelines used in deep learning applications. The set of operations that can be found in DALI includes, but is not limited to, data loading, decoding multiple formats of image, video, and audio, as well as a wide range of processing operators.

Diagram shows consecutive stages: Input Data, Decode, GPU-Accelerated Augmentations, Preprocessed Data, and Training/Inference. The Training/Inference step is pictured with logos of the following deep learning frameworks: MXNet, PaddlePaddle, PyTorch, and TensorFlow.
Figure 1. NVIDIA DALI workflow.

What makes DALI performant is that it offloads most of preprocessing computation to the GPU. It processes the whole batch of data at one time rather than running operators sample by sample, using the parallel nature of GPU computations. Because of that, DALI is successfully used to accelerate the training of many deep learning models in production. For more information, see the following posts:

Another advantage of DALI is its portability. After it’s defined, a pipeline can be used with most of the popular deep learning frameworks, namely TensorFlow, PyTorch, MXNet, and Paddle Paddle. However, the DALI utility is not limited to training. After you have trained your model with DALI as a preprocessing library, you can use the corresponding data processing pipeline for inference.

Triton model ensembles

Triton Inference Server greatly simplifies the deployment of AI models at scale in production. It is an open-source software that provides support for multiple backends that can run your neural network inference. However, running an inference may require more complex pipelines that also include preprocessing and postprocessing stages. All steps are not always all present, but whenever your use case includes more than just calculating the output of a neural network, Triton Server comes with a convenient solution that simplifies building such pipelines.

Among many useful scheduling mechanisms that the Triton Server platform provides is the ensemble scheduler, which is responsible for pipelining models participating in the inference process while ensuring efficiency and optimizing throughput.

Diagram consists of a box labeled
Figure 2. Triton model ensemble scheme.

Using ensembles in Triton Server is easy and requires only preparing a single additional configuration file with a description of the pipeline. This file serves as a definition of a special ensemble model that is a facade encapsulating the whole inference process. With such a setup, you can send requests directly to the ensemble model, which hides the complexity of the pipeline. This can also reduce the communication overhead as the preprocessed data already resides on the GPU that is used to run the inference.

Introducing DALI backend

Here’s a real-life example: an image classification system. A picture is captured on the edge device and is sent to a frontend service. The frontend delegates inference to Triton Server that runs an image classification network, for example Inception v3. Typically, such networks require a decoded, normalized, and resized image as an input. Running those operations inside the client service (Figure 3) is time-consuming. On top of that, the decoded images increase network traffic, as they are bigger than the encoded images (Table 1). Still, this solution might be tempting, as it is easy to implement with popular libraries like OpenCV.

Resolution Decoded image Preprocessed image for Inception v3 Encoded image
720p 3.1 MB 1 MB 500 kB
1080p 6.2 MB 1 MB 700 kB
Table 1. Size comparison of the images for giver resolution. The encoded image size depends highly on its content and compression type.
Diagram contains a box labeled
Figure 3. Triton Server inference with client preprocessing.

On the other hand, a significantly more performant scenario is the one that implements the preprocessing pipeline as a Triton Server backend (Figure 4). In this case, you can take advantage of the GPUs that are already used by the server. The role of the frontend service is now reduced to handling requests from edge devices and sending encoded images directly to Triton Server. This simplifies the architecture of the cloud system as all computationally intensive tasks are moved to the Triton Server, which can be easily scaled later.

The preprocessing part could be implemented as a custom backend but that is quite complicated and low-level. It can also be written in one of the frameworks supported by Triton Server. However, if you’ve already have trained your network using the DALI input pipeline, you would probably like to reuse this code for inference.

Diagram contains a box labeled
Figure 4. Triton Server inference with server-side preprocessing using a custom or framework backend.

This is where the DALI backend comes in handy. Although DALI was initially designed to remove the preprocessing bottleneck during training, some of the features also come in handy in inference. In the image classification example, you put together a model ensemble, where the first step decodes, resizes, and normalizes the images using DALI GPU operators and sends the input data straight to the inference step (Figure 5).

Diagram contains a box labeled
Figure 5. Triton Server inference with server preprocessing using the DALI backend.

Image classification using Inception v3

How do you obtain a DALI model and put it into the model repository? Look at an example from the DALI backend repository.

Inception v3 is an example of an image classification neural network. All three of the preprocessing operations needed by this model (JPEG decoding, resizing, and normalizing) are good candidates for GPU parallelization. In DALI, they are GPU-powered.

The DALI model is going to be a part of the model ensemble. The following example shows the model repository directory structure, containing a DALI preprocessing model, TensorFlow Inception v3 model, and the model ensemble:

model_repository
 ├── dali
 │   ├── 1
 │   │   └── model.dali
 │   └── config.pbtxt
 ├── ensemble_dali_inception
 │   ├── 1
 │   └── config.pbtxt
 └── inception_graphdef
     ├── 1
     │   └── model.graphdef
     └── config.pbtxt 

The configuration file config.pbtxt for the preprocessing model is like the other Triton Server models:

name: "dali"
backend: "dali"
max_batch_size: 256
input [
  {
    name: "DALI_INPUT_0"
    data_type: TYPE_UINT8
    dims: [ -1 ]
  }
]
output [
  {
    name: "DALI_OUTPUT_0"
    data_type: TYPE_FP32
    dims: [ 299, 299, 3 ]
  }
]

The remaining question is where to get the model.dali file from? This file contains a serialized DALI pipeline that you can get by calling the serialize method on a DALI pipeline instance. The following code example is the DALI preprocessing pipeline for the Inception example:

@dali.pipeline_def(batch_size=256, num_threads=4, device_id=0)
def inception_pipeline():
   images = fn.external_source(device="cpu", name="DALI_INPUT_0")
   images = fn.decoders.image(images, device="mixed", output_type=types.RGB)
   images = fn.resize(images, resize_x=299, resize_y=299)
   images = fn.crop_mirror_normalize(
       images,
       dtype=types.FLOAT,
       mean=[0.485 * 255, 0.456 * 255, 0.406 * 255],
       std=[0.229 * 255, 0.224 * 255, 0.225 * 255])
   return images
pipe = inception_pipeline()
pipe.serialize(filename="model_repository/dali/1/model.dali")

The code is rather straightforward. Note the presence of a fn.external_source operator; it’s a placeholder later used by Triton Server to provide data to the pipeline. You must also remember to set the name of fn.external_source the same way that you named the input in the DALI config.pbtxt file. For more information about building DALI pipelines, see NVIDIA DALI Documentation.

Performance results

So how does the performance look? Figure 6 compares two Inception setup scenarios:

  • Client preprocessing: Samples are decoded, resized, and normalized in parallel using OpenCV.
  • Server preprocessing: The Python client script sends encoded images to the server, where the whole DALI preprocessing happens.
Graph with throughput (measured in inferences per second) on the vertical axis and latency (measured in milliseconds) on the horizontal axis. There are two plots on the graph: one labeled
Figure 6. Throughput vs. latency plots for both scenarios with batches of size 1, 4, 8, 32. The more to the left and to the top, the better the result is. The performance results were collected on a DGX A100 machine.

Using DALI gives you significant leverage over client preprocessing. Figure 6 shows that DALI gives significantly better performance results both in terms of overall latency and throughput. This is possible thanks to the fact that DALI takes advantage of full GPU compute capabilities, such as the hardware JPEG decoder. Another important factor is the communication overhead. An example JPEG image used in the inference with the resolution of 1280×720 is about 306 kB whereas the same image after preprocessing yields a tensor that has about 1048 kB. This means that sending preprocessed data might cause about 3x the network traffic.

Naturally, the results differ according to your specific use case and your infrastructure. However, using DALI for preprocessing data in your inference scenario is worth a try.

How to get the DALI backend?

Starting from tritonserver:20.11-py3, DALI Backend is included in the Triton Server Docker container. Just download the latest version and you’re good to go. Moreover DALI, Triton Server, and the DALI backend for Triton Server are all open-source projects so that you can build the most up-to-date versions from source.

We are waiting for your feedback!

If you have any problems or questions, do not hesitate to submit an issue on the triton-inference-server/dali_backend GitHub repository. You can also consult the NVIDIA DALI Documentation and the main NVIDIA/DALI repository.