Categories
Misc

NVIDIA AI Perception Coming to ROS Developers

Image of a piece of robotics equipment in a warehouse.NVIDIA announces new initiatives to deliver a suite of perception technologies to the ROS developer community. Image of a piece of robotics equipment in a warehouse.

All things that move will become autonomous. And all things autonomous will require advanced real-time perception. 

NVIDIA announced its latest initiatives to deliver a suite of perception technologies to the ROS developer community. These initiatives will reduce development time and improve performance for developers seeking to incorporate cutting-edge computer vision and AI/ML functionality into their ROS-based robotics applications.

Open Robotics to Extend ROS for NVIDIA AI

NVIDIA and Open Robotics have entered into an agreement to accelerate ROS 2 performance on NVIDIA’s Jetson edge AI platform and GPU-based systems and to enable seamless simulation interoperability between Open Robotics’s Ignition Gazebo and NVIDIA Isaac Sim on Omniverse. 

The Jetson platform is widely adopted by roboticists across a spectrum of applications. It is designed to enable high-performance, low latency processing for robots to be responsive, safe, and collaborative. Open Robotics will enhance ROS 2 to enable efficient management of data flow and shared memory across GPU and other processors present on Jetson. This will significantly improve the performance of applications that have to process high bandwidth data from sensors such as cameras and lidars in real-time. 

In addition to the enhancements for deployment of robot applications on Jetson, Open Robotics and NVIDIA are working on plans to integrate Ignition Gazebo and NVIDIA Isaac Sim. NVIDIA Isaac Sim already supports ROS 1 & 2 out of the box and features an all-important ecosystem of 3D content with its connection to popular applications, e.g., Blender and Unreal Engine 4. 

Ignition Gazebo brings a decades-long track record of widespread use throughout the robotics community, including in high-profile competition events such as the ongoing DARPA Subterranean Challenge. With the two simulators connected, ROS developers can easily move their robots and environments between Ignition and Isaac Sim to run large scale simulations and take advantage of each simulator’s advanced features, such as high-fidelity dynamics, accurate sensor models, and photorealistic rendering to generate synthetic data for training and testing of AI models. 

“As more ROS developers leverage hardware platforms that contain additional compute capabilities designed to offload the host CPU, ROS is evolving to make it easier to efficiently take advantage of these advanced hardware resources,” said Brian Gerkey, CEO of Open Robotics. “Working with an accelerated computing leader like NVIDIA and its vast experience in AI and robotics innovation will bring significant benefits to the entire ROS community.”

Software resulting from this collaboration is expected to be released in the spring of 2022.

Isaac GEMs Released for ROS with Significant Speedup

Isaac GEMs for ROS are hardware accelerated packages that make it easier for ROS developers to build high-performance solutions on the Jetson platform. The focus of these GEMs is on improving throughput on image processing and on DNN-based perception models that are of growing importance to roboticists. These packages reduce the load on the host CPU while providing significant performance gain. 

The new Isaac GEMs for ROS include: 

Image demonstrates Isaac ROS stereo camera support with the left and right camera view in the ROS Rviz tool.
Figure 1. Stereo camera support in ROS with left and right camera view in the ROS Rviz tool. Both RGB and depth images are shown in RViz.

New Isaac Sim Features Enable ROS Developers

The latest release of Isaac Sim includes significant support for the ROS developer community. Some of the more compelling examples of this are the ROS2 Navigation stack and the MoveIt Motion Planning Framework. These examples are available today and can be found in the Isaac Sim documentation

List of ROS Examples in Isaac Sim 

  • ROS April Tag
  • ROS Stereo Camera
  • ROS Navigation
  • ROS TurtleBot3 Sample
  • ROS Manipulation and Camera Sample
  • ROS Services
  • MoveIt Motion Planning Framework
  • Native Python ROS Usage
  • ROS2 Navigation
Figure 2. Functional block diagram of Isaac Sim on Omniverse showing robot model, environment model, and 3D assets inputs.

Isaac Sim Generates Synthetic Datasets for Training Perception

In addition to being a robotic simulator, Isaac Sim has a powerful set of capabilities to generate synthetic data to train and test perception models. These capabilities will be more important as roboticists incorporate more perception features into their platforms. It’s clear that the better that a robot can perceive its environment the more autonomous it can be, thereby requiring less human intervention. 

Once Isaac Sim generates synthetic datasets, they can be fed directly into NVIDIA TAO, an AI model adaptation platform, to adapt perception models for a robot’s specific working environment. The task of ensuring that a robot’s perception stack is going to perform in a given working environment can be started well before any real-data is ever collected from the target surroundings.

Roboticists have long faced challenges in connecting and integrating the classic robotic tasks like navigation to AI-based perception stacks. Isaac Sim addresses this workflow challenge by being a robotics and synthetic data generation tool simultaneously, with streamlined TAO training platform integration.



More to Come at ROS World and GTC 2021

NVIDIA is gearing up for ROS World on Oct 21-22, 2021. We are planning to release more new GEMs for Jetson developers including several popular DNNs. We will also announce features in Isaac Sim to support the ROS developer community. Be sure to stop by our virtual booth, attend our NVIDIA ROS roundtable, watch the technical presentation on Isaac Sim, and more.

NVIDIA has a great lineup of speakers, talks, and content at the upcoming GTC scheduled for Nov. 8-11. We have a track for robotics developers including a presentation by Brian Gerkey, CEO and cofounder of Open Robotics. Additionally, we have talks covering NVIDIA Jetson, Isaac ROS, Isaac Sim, Isaac GYM and more.

Getting Started Today

The following resources are available for developers interested in getting started today on adding NVIDIA AI Perception to their products.

Isaac GEMs for ROS >>
Isaac Sim information >>
Tutorials on Synthetic Data Generation with Isaac Sim >>
Accelerating ML Training with TAO toolkit information >>

Categories
Misc

AI Model Rapidly Identifies Structures Damaged by Wildfires

Image of a wildfire encroaching on a town in Portugal.New research develops a deep learning algorithm to detect wildfire damage remotelyImage of a wildfire encroaching on a town in Portugal.

Wildfire evacuees and disaster response groups could soon have the power to remotely scan a town for structural damage within minutes, using the newly developed AI tool DamageMap.

A collaboration between researchers at Stanford University and California Polytechnic State University, San Luis Obispo—the project uses aerial imagery and a deep learning algorithm to pinpoint building damage after a wildfire event. The research could guide disaster relief and personnel toward areas that need it most, while keeping concerned homeowners informed.

“After a fire or disaster, lots of people need or want to know the extent and severity of damage. We set out to help reduce the response time to get actionable information valuable to fire victims, and emergency and recovery personnel,” said G. Andrew Fricker, an assistant professor at Cal Poly and codeveloper of DamageMap.

As the impacts of climate change lead to warmer and drier conditions, wildfire disasters are hitting communities more frequently and severely. In 2020 Western US wildfires destroyed over 13,000 buildings, amounting to almost $20 billion in losses. With months to go in this season, California has already seen over 7,000 fires damage about 3,000 structures.

When blazes subside damage assessments teams perform inspections and evaluate the safety of burned areas. These reports are used by emergency operations centers to organize disaster relief and recovery resources for residents. Knowing the location and the amount of damage in a region could help emergency groups allocate resources, especially when juggling multiple fires simultaneously.

While inspections are an essential step for repopulation, they are also time-consuming and resource-intensive.

Recent machine learning models have looked to alleviate this burden using satellite imagery. But, most methods require high-quality pre- and post-wildfire images of similar composition (such as lighting and angle) to detect changes and pinpoint areas of damage. They also require up-to-date images for accuracy, which can be costly to maintain and difficult to scale.

With DamageMap, the researchers trained a new deep learning algorithm capable of detecting damage by employing two models that work together and sleuth out the conditions of a building. The first model relies on any pre-fire drone or satellite imagery in a region to detect buildings and map out footprints. The second model uses post-fire aerial images to determine structural damage, such as scorched roofs or destroyed buildings.

The researchers used a database of 47,543 images of structures from five different wildfires across the globe to train the neural network. Hand-labeling a subset of these images as damaged and undamaged, the algorithm learned to identify and classify structures.

They tested the model using imagery from two recent California wildfires—the Butte County Camp Fire, and Shasta and Trinity County Carr Fire. Comparing model predictions against ground surveyor data—which records the location of damaged buildings—DamageMap accurately detected damaged structures about 96% of the time.

The technology is not only accurate, it’s also fast. Using an NVIDIA GPU and the cuDNN-accelerated PyTorch deep learning framework, DamageMap processes images at a rate of about 60 milliseconds per image.

Figure 1. DamageMap identifies damaged buildings in red and safe in green.
Courtesy of the DamageMap team 

Classifying the 15,931 buildings in the town of Paradise—an area almost completely destroyed by the 2018 Camp Fire—takes 16 minutes.  

The work is available for testing and exploring, with the code and supporting analysis on GitHub. The researchers encourage others to use, develop, and improve the model further. 

According to Fricker, the tool can be trained to look beyond damaged buildings and include elements such as burned cars, or downed power lines to further inform response and recovery efforts.


Read the full article in the International Journal of Disaster Risk Reduction >>
Read more >>   

Categories
Misc

NVIDIA Invites Healthcare Startup Submissions to Access UK’s Most Powerful Supercomputer

It takes major computing power to tackle major projects in digital biology — and that’s why we’re connecting pioneering healthcare startups with the U.K.’s most powerful supercomputer, Cambridge-1. U.K. startups can now apply to harness the system, which is dedicated to advancing healthcare with AI and digital biology. Since inaugurating Cambridge-1 in July, five founding Read article >

The post NVIDIA Invites Healthcare Startup Submissions to Access UK’s Most Powerful Supercomputer appeared first on The Official NVIDIA Blog.

Categories
Misc

Wild Things: 3D Reconstructions of Endangered Species with NVIDIA’s Sifei Liu

Endangered species can be difficult to study — they’re elusive, and the very act of observing them can disrupt their lives. Now, scientists can take a closer look at endangered species by studying AI-generated 3D representations of them. Sifei Liu, a senior research scientist at NVIDIA, has worked with her team to create an algorithm Read article >

The post Wild Things: 3D Reconstructions of Endangered Species with NVIDIA’s Sifei Liu appeared first on The Official NVIDIA Blog.

Categories
Misc

Next Generation: ‘Teens in AI’ Takes on the Ada Lovelace Hackathon

Jobs in data science and AI are among the fastest growing in the entire workforce, according to LinkedIn’s 2021 Jobs Report. Teens in AI, a London-based initiative, is working to inspire the next generation of AI researchers, entrepreneurs and leaders through a combination of hackathons, accelerators, networking events and bootcamps. In October, the organization, with Read article >

The post Next Generation: ‘Teens in AI’ Takes on the Ada Lovelace Hackathon appeared first on The Official NVIDIA Blog.

Categories
Misc

NVIDIA Calls UK AI Strategy “Important Step,” Will Open Cambridge-1 Supercomputer to UK Healthcare Startups

NVIDIA today called the U.K. government’s launch of its AI Strategy an important step forward, and announced a programme to open the Cambridge-1 supercomputer to U.K. healthcare…

Categories
Misc

Guide to Autoencoders with TensorFlow & Keras

Guide to Autoencoders with TensorFlow & Keras submitted by /u/RubiksCodeNMZ
[visit reddit] [comments]
Categories
Misc

Spyder/Tensorflow stuck on first epoch

I’ll link the StackOverflow post: https://stackoverflow.com/questions/69267805/spyder-tensorflow-stuck-on-first-epoch

Help is deeply appreciated. Thanks.

submitted by /u/Snoo37084
[visit reddit] [comments]

Categories
Misc

Getting Started with NVIDIA Networking

Man with a laptop.Preview and test Cumulus Linux in your own environment, at your own pace, without organizational or economic barriers. Man with a laptop.

Looking to try open networking for free? Try NVIDIA Cumulus VX—a free virtual appliance that provides all the features of NVIDIA Cumulus Linux. You can preview and test Cumulus Linux in your own environment, at your own pace, without organizational or economic barriers. You can also produce sandbox environments for prototype assessment, preproduction rollouts, and script development.

Cumulus VX runs on all popular hypervisors, such VirtualBox and VMware VSphere, and orchestrators, such as Vagrant and GNS3. 

Our website has the images needed to run NVIDIA Cumulus VX on your preferred hypervisor—download is simple. What’s more, we provide a detailed guide on how to install and set up Cumulus VX to create this simple two leaf, one spine topology.

Figure 1. Cumulus VX two leaf, one spine topology.

With these three switches up and running, you are all set to try out NVIDIA Cumulus Linux features, such as traditional networking protocols (BGP and MLAG), and NVIDIA Cumulus-specific technologies, such as ONIE and Prescriptive Topology Manager (PTM). And, not to worry, the Cumulus Linux User’s Guide is always close at hand to help you out, as well as the community Slack channel, where you can submit questions and engage with the wider community.

Explore further and try advanced configurations:

  • Update your virtual environment to use the NVIDIA Cumulus Linux on-demand self-paced labs (a quick and easy way to learn the fundamentals.) 
  • Run the topology converter to simulate a custom network topology with VirtualBox and Vagrant, or KVM-QEMU and Vagrant.

If your needs are different, or if you have platform or disk limitations, we also provide an alternative to NVIDIA Cumulus VX. NVIDIA Cumulus in the Cloud is a free, personal, virtual data center network that provides a low-effort way to see NVIDIA Cumulus technology in action—no hypervisor needed.

Categories
Misc

Transforming Noisy Low-Resolution into High-Quality Videos for Captivating End-User Experiences

The NVIDIA Maxine Video Effects SDK offers AI-based visual features that transform noisy, low-resolution video streams into pleasant user experiences. This post demonstrates how you can run these effects with standard webcam input and easily integrate them into video conference and content creation pipelines.

Video conferencing, audio and video streaming, and telecommunications recently exploded due to pandemic-related closures and work-from-home policies. Businesses, educational institutions, and public-sector agencies are experiencing a skyrocketing demand for virtual collaboration and content creation applications. The crucial part of online communication is the video stream, whether it’s a simple video call or streaming content to a broad audience. At the same time, these streams are the most network bandwidth-intensive part of online communication, often accompanied by noise and artifacts.

To solve these video quality challenges, the NVIDIA Maxine Video Effects SDK offers AI-based visual features that transform noisy, low-resolution video streams into pleasant user experiences. This post demonstrates how you can run these effects with standard webcam input and easily integrate them into video conference and content creation pipelines.

Add details and improve resolution

For poor video quality that arises from the low resolution of the image frames, the Maxine Video Effects SDK provides two state-of-the-art AI-based visual effects: Super Resolution and Upscaler.

Super Resolution (Figure 1) generates a superior quality image with higher resolution and better textures from the provided input image. It offers holistic enhancements while preserving the content. This visual effect is best used on lossless compression data such as H.264. You can use this feature to scale media by 1.33x, 1.5x, 2x, 3x, and 4x.

In the before/after picture, you can observe increases in both structural details and textures.
Figure 1. Super Resolution feature in action

To tune up the Super Resolution effect, select its mode:

  • 0: Recommended for streams containing encoding artifacts and streams encoded with lossy compression.
  • 1: Applies strong visual enhancements and is recommended for streams encoded with lossless compression

Upscaler (Figure 2) is a fast and light-weighted method for increasing the video resolution of an input video while also adding detail to the image. It focuses on the geometric structure of the frame’s content and enhances its details. Besides better image resolution, the Upscaler effect produces crisper and sharper images.

In the before/after picture, you can observe enhanced structural details.
Figure 2. Upscaler feature in action

You can set Upscaler’s enhancement parameter within [0,1] range:

  • 0: Increases the resolution without image enhancement.
  • 1: Maximum image sharpness and crispness visual effect enhancement.

By default, Upscaler’s enhancement parameter is set to 0.4.

Remove webcam video noise and reduce encoding artifacts

The underlying causes of video noise that make or break the end-user experience are numerous. However, the two most common sources of noise are webcam noise and encoding artifacts.

Examples of webcam noise sources include the camera sensor type, exposure, or illumination level. This is especially true in the context of end-user–generated streams, if the environment is not well lit or the camera being used is of poor quality. These types of noises are highly dependent on the type of sensor in the camera.

Encoding artifacts in video streams are a consequence of the bandwidth constraints required to transmit frames. Lossy compression typically involves discarding some of the textural information in an image as well as data encoding. Common examples of lossy compression standards would be JPEG for images and H.264 for videos. When streaming this media, the stream bandwidth per unit of time is called bitrate.

In a streaming environment, the bandwidth available to stream the compressed content is not constant. This variability causes situations where the encoder has fewer bits than needed to compress the frame resulting in compression artifacts. Compression artifacts can take many forms, but one of the most common form is a blocky artifact.

The Video Noise Removal (Figure 3) feature of the Maxine Video Effects SDK enables you to de-noise the webcam streams and preserve details, leading to better end-user experiences.

In the before/after picture, you can observe that camera noise is removed.
Figure 3. Video Noise Removal feature in action

This feature has two variants with strength values:

  • 0: For a weaker noise reduction effect that ensures the preservation of texture quality. This is ideal for media with low noise.
  • 1: For a substantial noise reduction effect that may impact texture quality. This variant can easily be chained with Upscaler or Super Resolution to add details, enhance, and increase resolution.

The Maxine Artifact Reduction feature (Figure 4) reduces blocky artifacts encountered when bandwidth drops on a video call. It also reduces ringing and mosquito noises, while preserving the details of the original video.

In the before/after picture, you can observe that encoding artifacts are rem
Figure 4. Artifact Reduction feature in action

This AI-based feature is optimized for two modes:

  • 0: Preserves low gradient information while reducing artifacts. This mode is more suited for a higher bitrate video.
  • 1: Provides a better output stream. This mode should be applied for higher-quality lossless videos with a lower bitrate.

Enable end users to choose virtual backgrounds

To enable end users to join a meeting from an environment that is neither personal nor distracting, the Maxine Video Effects SDK offers the Virtual Background feature.

The Virtual Background feature (Figure 5) essentially generates a mask to segment out the foreground, in this case, people from the stream. You can provide any media as a background, whether image or video. You can also implement multiple creative applications, like adding multiple users in the same background. For example, if two commentators are talking about a live event, you can segment both onto the live feed of the event. Another example is segmenting out users and overlaying them on their computer’s live feed. This way, single or multiple users can present at the same time in real time while retaining immersion. All these operations use the parallelism that a GPU provides, increasing the number of streams that can be processed in real time.

In the picture, a new background is being applied.
Figure 5. The Virtual Background feature in action

The Virtual Background feature runs in two modes:

  • Quality mode: For highest segmentation quality
  • Performance mode: For the fastest performance

You can also use this feature to generate a blurred background with tunable blur strength.

Chain Video Effects features

For processing precompressed videos or videos with noise, along with providing a higher resolution, we recommend chaining Upscaler with Artifact Reduction or Video Noise Removal, depending on the use case. For more information, see Exploring the API. You could also get an out-of-the-box experience with the UpscalePipeline sample application packaged with the SDK.

Install the Video Effects SDK using containers and on Windows and Linux

NVIDIA offers the Maxine Video Effects SDK through Docker containers, and on both Windows and Linux platforms in the form of SDK packages.

The benefits of using containers are high scalability, and time-and-cost savings due to reduced deployment and adoption time. Using containers with Kubernetes provides a robust and easy-to-scale deployment strategy. In addition, because of the prepackaged nature of containers, you don’t have to worry about specific installations inside the container.

In this post, we focus on how to use the Maxine Video Effects SDK with containers and Windows. Before proceeding with the installation, make sure that you meet all the hardware requirements.

If you have considerable experience with the NVIDIA software stack and want to deploy the Video Effects SDK on a bare-metal Linux system, see the Maxine Getting Started page.

Use the Video Effects SDK in Docker containers

There are four steps to install and take advantage of the high-performance Video Effects SDK and its state-of-the-art AI models on containers:

You would need access to NVIDIA Turing, NVIDIA Volta, or NVIDIA Ampere Architecture generation data center GPUs: T4, V100, A100, A10, or A30.

Install the Video Effects SDK on Windows

Installing the SDK on Windows is a straightforward process:

You must have an NVIDIA RTX card to benefit from the accelerated throughput and reduced latency of the Maxine Video Effects SDK on Windows. To run this SDK on a data center card like A100, use the Linux package.

Sample applications

The Video Effects SDK comes packaged with five sample applications:

  • AigsEffectApp
  • BatchEffectApp
  • DenoiseEffectApp
  • UpscalePipelineApp
  • VideoEffectsApp

These applications contain sample code to run all the features in the Video Effects SDK. To experience these features, you can also build the application and use prebuilt Windows bash scripts to run them.

You can build the applications using the build_samples.sh script found in the /VideoFX/share folder for the SDK. If you are using the Docker container, this is the folder of entry.

bash build_samples.sh’

The script builds the sample apps and installs some sample, app-specific dependencies. This step might take a few minutes. After it’s built, you can find at least one bash script per application in the folder where you built the applications. Here’s a closer look at one of the applications:

#!/bin/sh

. ./setup_env.sh

VideoEffectsApp 
        --model_dir=$_VFX_MODELS 
        --in_file=$_VFX_SHARE/samples/input/input1.jpg 
        --out_file=ar_1.png 
        --effect=ArtifactReduction 
        --mode=1 
        --show

VideoEffectsApp 
        --model_dir=$_VFX_MODELS 
        --in_file=$_VFX_SHARE/samples/input/input1.jpg 
        --out_file=ar_0.png 
        --effect=ArtifactReduction 
        --mode=0 
        --show

VideoEffectsApp 
        --model_dir=$_VFX_MODELS 
        --in_file=$_VFX_SHARE/samples/input/input2.jpg 
        --out_file=sr_0.png 
        --effect=SuperRes 
        --resolution=2160 
        --mode=0 
        --show

VideoEffectsApp 
        --model_dir=$_VFX_MODELS 
        --in_file=$_VFX_SHARE/samples/input/input2.jpg 
        --out_file=sr_1.png 
        --effect=SuperRes 
        --resolution=2160 
        --mode=1 
        --show

This is an example of the command line that refers to one of the sample applications, VideoEffectsApp. You can tweak the following arguments to experience different feature capabilities:

  • --effect: Choose the effect: ArtifactReduction, SuperRes, or Upscale.
  • --mode: Toggle between two modes: 0, 1.
  • --strength: Toggles the Upscaler enhancement multiplier: 0, 1.
  • --resolution: Use to input the target resolution of the selected media. For instance, if you have a 720p media to double, use 1440.

When running these effects locally, you can use the keyboard controls to toggle the effects and experience the effects live with your webcam feed. For more information, see the Sample Applications Reference. If you are interested in chaining these effects, keep reading. Finally, if you are interested in learning more about batching and maximizing throughput, see the BatchEffectApp sample application.

Use the API to chain multiple video effects features

Chaining effects is quite interesting for many applications. This post focuses on how to chain two effects that work well together: Artifact Reduction and Upscaler. Another example would be running Video Noise Removal and Super Resolution or Upscaler for a noisy webcam stream. You can pick and choose the effects that best fit your use case.

Here’s more about the API and its usage. Figure 6 shows the high-level process of using the functions from the Video Effects SDK:

  • Creating and configuring the effect
  • Configuring CUDA streams, allocating buffers, and loading the model
  • Loading the data and running the effects
The process includes the following steps: Create the effect, load the model, and use the effect.
Figure 6. Three simple steps to use Video Effects SDK API

The following video covers this flow, but this process has many granular details, which we discuss later in this post. Also, the video touches on the basics that you must know while working with GPUs and API details for the Maxine virtual background. All code examples in this post are available in the SDK sample applications.



Video 1. Creating your own virtual background

Creating and configuring the effect

The first step is creating the effects to use. In this post, we discuss Artifact Reduction and Upscaler. You can create an instance of the specified type of video effect filter with the NvVFX_CreateEffect function. This function requires an effect selector and returns the effect handle. The effect selector is a string with which you can pick the effect to create.

NvVFX_Handle _arEff;
NvVFX_Handle _upscaleEff;
NvVFX_EffectSelector first;
NvVFX_EffectSelector second;

NvVFX_CreateEffect(first, &_arEff);
NvVFX_CreateEffect(second, &_upscaleEff);

Then, use the NvVFX_SetString function to specify the location of the model for the feature.

NvVFX_SetString(_arEff, NVVFX_MODEL_DIRECTORY, modelDir);
NvVFX_SetString(_upscaleEff, NVVFX_MODEL_DIRECTORY, modelDir);

Most of the Video Effects SDK features have modes. These modes, as discussed previously, are essentially two different variants of the same effect. In this case, Artifact Reduction has two modes that you can set with the NvVFX_SetU32 function. In the case of Upscaler, this is a floating-point that can be set to any number between 0 and 1 using the NvVFX_SetF32 function.

int FLAG_arStrength = 0;
float FLAG_upscaleStrength= 0.2f;

NvVFX_SetU32(_arEff, NVVFX_STRENGTH, FLAG_arStrength);
NvVFX_SetF32(_upscaleEff, NVVFX_STRENGTH, FLAG_upscaleStrength);

Configuring CUDA streams, allocating buffers, and loading the model

As the effects have been created, here’s how to use CUDA and load the models. A CUDA stream is a set of operations executed in the exact sequence in which they were issued. With that in mind, the first step is to create this stream. You can create this stream with the NvVFX_CudaStreamCreate function.

CUstream _stream;
NvVFX_CudaStreamCreate(&_stream);

Now that you have the stream, assign the effects to the stream. You achieve this with the NvVFX_SetCudaStream function.

NvVFX_SetCudaStream(_arEff, NVVFX_CUDA_STREAM, stream));
NvVFX_SetCudaStream(_upscaleEff, NVVFX_CUDA_STREAM, stream);

Now that a CUDA stream is in place, here’s how to move data. In this case, you are moving image frames. If you are new to GPUs, you might ask, “Why and where we are moving the data?”

GPUs typically have their own dedicated video RAM (VRAM). This is like the usual RAM that is plugged into the motherboard of a system. The key advantage of having dedicated VRAM is that the data stored in this memory is processed significantly faster than the data on a regular RAM. When we say, “Move the data from CPU memory to GPU memory,” we are referring to the memory transfers between these two types of RAM.

Memory transfer between CPU and GPU memory goes both ways.
Figure 7. Overview of CPU vs. GPU buffer

In a typical scenario using a single effect, this transfer would be effortless, requiring two CPU memory buffers and two GPU buffers. In both cases, one would be for the source and the other would be for the processed frame.

Memory transfer for single video effect with CPU and GPU memory for source and processed frame.
Figure 8. Moving data between different memory buffers on GPU and CPU

As you are chaining the features that require two different image pixel layouts, there is an added layer of complexity. You must have two more buffers on the GPU, one to store the output frame for the first effect and the other to store the input of the second effect. Figure 9 shows the flow. Don’t worry about the function names just yet; we review them in the Run the effects section later in this post.

Memory transfer for chained video effects with additional intermediate buffer layer.
Figure 9. Moving data between different memory buffers on GPU and CPU while accounting for pixel format

With this high-level understanding in mind, here’s how set up the pipeline. There are two steps in setting up this pipeline: allocating memory and specifying the input and output buffers.

First, allocate memory for the GPU buffers, using the NvCVImage_Alloc function.

NvCVImage _srcGpuBuf;
NvCVImage _interGpuBGRf32pl;
NvCVImage _interGpuRGBAu8;
NvCVImage _dstGpuBuf;

// GPU Source Buffer
NvCVImage_Alloc(&_srcGpuBuf, _srcImg.cols, _srcImg.rows, NVCV_BGR, NVCV_F32, NVCV_PLANAR, NVCV_GPU, 1); 

// GPU Intermediate1 Buffer
NvCVImage_Alloc(&_interGpuBGRf32pl, _srcImg.cols, _srcImg.rows, NVCV_BGR, NVCV_F32, NVCV_PLANAR, NVCV_GPU, 1);

// GPU Intermediate2 Buffer
NvCVImage_Alloc(&_interGpuRGBAu8, _srcImg.cols, _srcImg.rows, NVCV_RGBA, NVCV_U8, NVCV_INTERLEAVED, NVCV_GPU, 32);

// GPU Destination Buffer
NvCVImage_Alloc(&_dstGpuBuf, _dstImg.cols, _dstImg.rows, NVCV_RGBA, NVCV_U8, NVCV_INTERLEAVED, NVCV_GPU, 32);

That seems like a complicated function, but on a high level, you are specifying basic parameters for the desired type of buffer for the given type of image frame. For example, is it an RGBA image? Does each component have 8 bits? Are the bits in a planar, chunky, or any other format? For more information about specifics, see Setting the Input and Output Image Buffers.

Second, specify the input and output buffers that you created for each effect, using the NvVFX_SetImage function.

// Setting buffers for 
NvVFX_SetImage(_arEff, NVVFX_INPUT_IMAGE,  &_srcGpuBuf);
NvVFX_SetImage(_arEff, NVVFX_OUTPUT_IMAGE, &_interGpuBGRf32pl);

NvVFX_SetImage(_upscaleEff, NVVFX_INPUT_IMAGE, &_interGpuRGBAu8);
NvVFX_SetImage(_upscaleEff, NVVFX_OUTPUT_IMAGE, &_dstGpuBuf);

Lastly, load the models. The NvVFX_Load function does the same. It also validates if the parameters selected for effect are valid.

NvVFX_Load(_arEff);
NvVFX_Load(_upscaleEff);

Run the effects

Now that the pipeline is set up, you can proceed to run the effects. Move the frames from the CPU/GPU source into the corresponding input buffer. The NvCVImage_Transfer function can be used to move the frames, and the NvVFX_Run function is used to run the effect.

// Frame moves from CPU buffer to GPU src buffer
NvCVImage_Transfer(&_srcVFX, &_srcGpuBuf, 1.f/255.f, stream, &_tmpVFX);

// Running Artifact Reduction
NvVFX_Run(_arEff, 0);

// Frame moves from GPU intermediate buffer 1 to buffer 2
NvCVImage_Transfer(&_interGpuBGRf32pl, &_interGpuRGBAu8, 255.f, stream, &_tmpVFX);

// Running Upscaler
NvVFX_Run(_upscaleEff, 0));

// Frame moves from GPU destination buffer to CPU buffer
NvCVImage_Transfer(&_dstGpuBuf, &_dstVFX, 1.f, stream, &_tmpVFX));

On the first pass, it might seem that there are multiple moving parts, but there are only three major steps: creating the effect, setting up CUDA streams along with managing the data flow, and finally running the effects.

All three of the Maxine SDKs—Video Effects SDK, Audio Effects SDK, and Augmented Reality SDK—are designed similarly. You can apply this same concept to the Audio Effects and Augmented Reality SDKs with minor modifications.

Integrate the Video Effects SDK into your applications

As demonstrated in this post, the Maxine Video Effects SDK provides many AI features that enable you to take a noisy low-resolution video and deliver high-quality video to your end users. Furthermore, you can chain multiple effects together and create a video pipeline. To apply these visual effects to your video conferencing, streaming, or telecommunication applications, see Maxine Getting Started page. Let us know what you think or if you have any questions.