Categories
Misc

Supporting Low-Latency Streaming Video for AI-Powered Medical Devices with Clara Holoscan

Closeup of a surgeon at work.NVIDIA Clara Holoscan and Clara AGX Developer Kit accelerates development of AI for endoscopy, laparoscopy, and other surgical procedures.Closeup of a surgeon at work.

NVIDIA Clara Holoscan provides a scalable medical device computing platform for developers to create AI microservices and deliver insights in real time. The platform optimizes every stage of the data pipeline: from high-bandwidth data streaming and physics-based analysis to accelerated AI inference, and graphic visualizations.

The NVIDIA Clara AGX Developer Kit, which is now available, combines the efficient Arm-based embedded computing of the AGX Xavier SoC with the powerful NVIDIA RTX 6000 GPU and the 100 GbE connectivity of the NVIDIA ConnectX-6 network processor. This brings real-time AI acceleration to the next generation of intelligent, software-defined, embedded medical devices. Developers using the Clara AGX Developer Kit for surgical video applications—such as AI-enhanced endoscopy, laparoscopy, or other minimally invasive procedures—require the minimum possible end-to-end latency in their video processing path. Customers can use the Clara Holoscan SDK v0.1 on the Clara AGX Developer Kit today and on the next-generation developer kit in the second half of 2022.

The demands of surgical video necessitate consistent and reliable low-latency, between the image captured by the endoscope and the image projected on a monitor. This provides surgeons with real-time control of their tools and monitoring of the patient.

In a typical endoscopy system, the image is digitized at the camera sensor in the endoscope, serialized by an FPGA or ASIC and transmitted to a video processor where it is written to an input frame buffer, processed, written to an output frame buffer, and then transmitted serially to the monitor.  Each of these steps adds latency to the video pipeline.  Developers who wish to add advanced GPU-accelerated AI processing are then faced with additional transmission latency due to the need to write the data from the video capture card to system memory, then transfer it via the CPU and PCIe bus to the GPU.  

GPU compute performance is a key component of the NVIDIA Clara Holoscan platform. To optimize GPU-based video processing applications, NVIDIA has partnered with AJA Video Systems to integrate their line of video capture cards with the Clara AGX Developer Kit.  AJA provides a wide range of proven, professional video I/O devices. The partnership between NVIDIA and AJA has led to the addition of Clara AGX Developer Kit support in the AJA NTV2 SDK and device drivers as of the NTV2 SDK 16.1 release.

The AJA drivers and SDK now offer GPUDirect support for NVIDIA GPUs. This feature uses remote direct memory access (RDMA) to transfer video data directly from the capture card to GPU memory. This significantly reduces latency and system PCIe bandwidth for GPU video processing applications, as system memory to GPU copies are eliminated from the processing pipeline.

AJA devices now also incorporate RDMA support into the AJA GStreamer plug-in to enable zero-copy GPU buffer integration with the DeepStream SDK. DeepStream applications can now process video data along the entire pipeline, from the initial capture to final display, without leaving GPU memory.

NVIDIA Clara Holoscan SDK v0.1 builds on the features of the previous Clara AGX SDK and adds tools to allow for detailed measurement of video transfer latency between video I/O cards, the CPU, and the GPU. This will enable users to measure latency with various configurations, allowing them to focus on improving bottlenecks and optimizing their workflows for minimum end-to-end latency.

Data transfer latency was measured using the Clara AGX Developer Kit with an AJA capture card using the internal PCIe Gen3 x8 connection. The following tables demonstrate the latency reduction that can be achieved using GPUDirect. 

Format Width Height Bytes/pixel Frames/sec
720p YUV 1280 720 2 60
1080p YUV 1920 1080 2 60
4K UHD YUV 3840 2160 2 60
720p RGBA 1280 720 4 60
1080p RGBA 1920 1080 4 60
4K UHD RGBA 3840 2160 4 60
Table 1. Results of video formats tested

The total time for video data transfer to and from the GPU, as well as time remaining for processing in the GPU, was then measured with and without GPUDirect enabled:

Format Without GPUDirect GPUDirect
  Transfer time, no processing (ms) Time remaining for processing (ms) Transfer time, no processing (ms) Time remaining for processing (ms)
720p YUV 1.945 14.721 0.956 15.710
1080p YUV 3.865 12.801 1.723 14.943
4K UHD YUV 12.805 3.861 6.256 10.410
720p RGBA 3.451 13.215 1.548 15.118
1080p YUV 6.816 9.850 3.225 13.444
4K UHD RGBA 23.686 -7.020 12.406 4.260
Table 2. Latency (ms) with and without GPUDirect.

Note that GPUDirect cuts transfer time approximately in half by removing the need for writes to system memory. GPUDirect allows for the transfer and processing of 4K UHD RGBA inputs at 60 fps. This can now be transferred under the 16.666 ms frame time, whereas without GPUDirect this format could not be transferred at 60 fps. This allows for uncompressed high-resolution video to be natively alpha-blended with overlays from AI workflows. There is no need for conversion from YUV to RGBA formats, and no compromise in the 60 fps frame rate. 

For instructions on how to set up and use an AJA device with the Clara AGX Developer Kit, including RDMA and DeepStream integration, go to Chapter 9 of the Clara HoloClara Holoscan SDK User Guide.

Categories
Misc

Close Knowledge Gaps and Elevate Training with Digital Twin NVIDIA Air

4 people sitting around laptops in an office.Learn about the NVIDIA Air platform, a fully functional digital twin of a production environment.4 people sitting around laptops in an office.

Training resources are always a challenge for IT departments. There is a fine line between letting new team members do more without supervision and keeping the lights on by making sure no mistakes are made in the production environment. Leaning towards the latter method and limiting new team members’ access to production deployments may lead to knowledge gaps. How can new team members learn if they never get time on the network?

To close the knowledge gaps, IT teams can leverage a networking digital twin. A digital twin provides a fully functional replica of a production deployment, so each member of the team can learn in a safe and sandboxed environment. A digital twin eliminates the risk of making mistakes that could materially impact the business. Changes can be implemented and validated in a sandbox environment before pushing any changes to production, providing a new level of confidence.

Train with NVIDIA Air

NVIDIA has such an offering with the Air Infrastructure Simulation Platform (Air). The program supports organizations training their staff by leveraging a digital twin approach and providing a full production experience for all team members. With everyone able to contribute (either by training in Air or working directly in production), IT departments can use staff more effectively and boost operational efficiency. 

With the use of NVIDIA Air, IT teams give team members their own replica of the production environment on which to learn. No more waiting for hardware resources to be racked and stacked, or balancing limited lab time across multiple users. Staff can use the platform for free, build an exact network digital twin, validate configurations, confirm security policies, and test CI/CD pipelines. In addition to CLI access, the platform provides full software functionality and access to the core system components such as Docker containers and APIs. 

This allows less-skilled team members to be an integral part of the network operation, helping them to catch up with the team’s experience, enhance the sense of belonging within the team, and gain confidence to work on production when the time is right. 

Get Started

Air is free to use and easy to work with. Build your own digital twin today, and help your team to learn the production environment, practice procedures, and test changes without introducing risk.

Categories
Misc

Atos and NVIDIA to Advance Climate and Healthcare Research With Exascale Computing

Atos and NVIDIA today announced the Excellence AI Lab (EXAIL), which brings together scientists and researchers to help advance European computing technologies, education and research.

Categories
Misc

Siemens Energy Taps NVIDIA to Develop Industrial Digital Twin of Power Plant in Omniverse

Siemens Energy, a leading supplier of power plant technology in the trillion-dollar worldwide energy market, is relying on the NVIDIA Omniverse platform to create digital twins to support predictive maintenance of power plants. In doing so, Siemens Energy joins a wave of companies across various industries that are using digital twins to enhance their operations. Read article >

The post Siemens Energy Taps NVIDIA to Develop Industrial Digital Twin of Power Plant in Omniverse appeared first on The Official NVIDIA Blog.

Categories
Misc

Universities Expand Research Horizons with NVIDIA Systems, Networks

Just as the Dallas/Fort Worth airport became a hub for travelers crisscrossing America, the north Texas region will be a gateway to AI if folks at Southern Methodist University have their way. SMU is installing an NVIDIA DGX SuperPOD, an accelerated supercomputer it expects will power projects in machine learning for its sprawling metro community Read article >

The post Universities Expand Research Horizons with NVIDIA Systems, Networks appeared first on The Official NVIDIA Blog.

Categories
Misc

Gordon Bell Finalists Fight COVID, Advance Science With NVIDIA Technologies

Two simulations of a billion atoms, two fresh insights into how the SARS-CoV-2 virus works, and a new AI model to speed drug discovery. Those are results from finalists for Gordon Bell awards, considered a Nobel prize in high performance computing. They used AI, accelerated computing or both to advance science with NVIDIA’s technologies. A Read article >

The post Gordon Bell Finalists Fight COVID, Advance Science With NVIDIA Technologies appeared first on The Official NVIDIA Blog.

Categories
Misc

`ValueError: Data cardinality is ambiguous: ` after running `model.fit`

submitted by /u/Guacamole_is_good
[visit reddit] [comments]

Categories
Misc

Training Object Detection Model for Tensorflow Lite on Raspberry Pi

I have succesfully setup Tensorflow Lite object detection on my Raspberry Pi 3b+, I have tested it on some google sample models and can confirm it works properly.

I am looking to create my own custom Object Detection model and I am looking for the absolute easiest way to do this (preferably on Ubuntu but can use Windows). Does anyone have any good methos or tutorials. I have tried a couple Github tutorials as well as the Tensorflow Lite Model maker Colab with no luck.

Has anyone used any of these tools or have any experience/advice for training my own Tensorflow Lite Object detection Model for my Pi.

submitted by /u/MattDlr4
[visit reddit] [comments]

Categories
Misc

Input pipeline performances

Hi Reddit

I’m comparing 2 input pipelines. One is built using tf.keras.utils.image_dataset_from_directory and the other build “manually” by reading files from a list using tf.data.Dataset.from_tensor_slices. My first intuition was that the tf.data.Dataset.from_tensor_slices should be faster, as demonstrated here.

But this is not the case. The image_dataset_from_directory is approximatively x6 time faster for batches of 32 to 128 images. Similar performance factor on Collab and on my local machine (run from PyCharm).

So far, I tried to avoid the “zip” of two dataset by having a read_image to output both the image and the label at once. Did not change anything.

Can you help me to build a decent input pipeline with tf.data.Dataset.from_tensor_slices. I would like to work with a huge dataset to train a GAN, and I do not want to loose time with the data loading. Did I code something wrong or are the test from here outdated ?

To be pragmatic, I will use the fastest approach. But as an exercise, I would like to know if my input pipeline wiht tf.data.Dataset.from_tensor_slices is ok.

Here are the code. data_augmentation_train is a sequential network (same in both approaches)

================================= Approach n°1: tf.keras.utils.image_dataset_from_directory ================================= AUTOTUNE = tf.data.AUTOTUNE train_ds = tf.keras.utils.image_dataset_from_directory( trainFolder, validation_split=0.2, subset="training", seed=123, image_size=(img_height, img_width), batch_size=batch_size) class_names = train_ds.class_names print(class_names) train_ds = train_ds.cache() train_ds = train_ds.shuffle(1000) train_ds = train_ds.map(lambda x, y: (data_augmentation_train(x, training=True), y), num_parallel_calls=AUTOTUNE) train_ds.prefetch(buffer_size=AUTOTUNE) 

======================================= Approach n°2:tf.data.Dataset.from_tensor_slices ======================================= def read_image(filename): image = tf.io.read_file(filename) image = tf.image.decode_jpeg(image, channels=3) image = tf.image.resize(image, [img_height, img_width]) return image def configure_dataset(filenames, labels, augmentation=False): dsfilename = tf.data.Dataset.from_tensor_slices(filenames) dsfile = dsfilename.map(read_image, num_parallel_calls=AUTOTUNE) if augmentation: dsfile = dsfile.map(lambda x: data_augmentation(x, training=True)) dslabels=tf.data.Dataset.from_tensor_slices(labels) ds = tf.data.Dataset.zip((dsfile,dslabels)) ds = ds.shuffle(buffer_size=1000) ds = ds.batch(batch_size) ds = ds.prefetch(buffer_size=AUTOTUNE) return ds filenames, labels, class_names = readFilesAndLabels(trainFolder) ds = configure_dataset(filenames, labels, augmentation=True) 

submitted by /u/seb59
[visit reddit] [comments]

Categories
Misc

Risky Business: Latest Benchmarks Show How Financial Industry Can Harness NVIDIA DGX Platform to Better Manage Market Uncertainty

Amid increasing market volatility, financial risk managers are looking for faster, better market analytics. Today that’s served up by advanced risk algorithms running on the fastest parallel computing systems. Boosting the state of the art for risk platforms, NVIDIA DGX A100 systems running Red Hat software can offer financial services firms performance and operational gains. Read article >

The post Risky Business: Latest Benchmarks Show How Financial Industry Can Harness NVIDIA DGX Platform to Better Manage Market Uncertainty appeared first on The Official NVIDIA Blog.