Powering Ultra-High Speed Frame Rates in AI Medical Devices with the NVIDIA Clara Holoscan SDK

In the operating room, the latency and reliability of surgical video streams can make all the difference for patient outcomes. Ultra-high-speed frame rates from…

In the operating room, the latency and reliability of surgical video streams can make all the difference for patient outcomes. Ultra-high-speed frame rates from sensor inputs that enable next-generation AI applications can provide surgeons with new levels of real-time awareness and control.

To build real-time AI capabilities into medical devices for use cases like surgical navigation, image-guided intervention such as endoscopy, and medical robotics, developers need AI pipelines that allow for low-latency processing of combined sensor data from multiple channels.

As announced at GTC 2022, NVIDIA Clara Holoscan SDK v0.3 now provides a lightning-fast frame rate of 240 Hz for 4K video. This enables developers to combine data from more sensors and build AI applications that can provide surgical guidance. With faster data transfer enabled through high-speed Ethernet-connected sensors, developers have even more tools to build accelerated AI pipelines.

Real-time AI processing of frontend sensors

NVIDIA Clara Holoscan enables high-speed sensor input through the ConnectX SmartNIC and NVIDIA Rivermax SDK with GPUDirect RDMA that bypasses the CPU. This allows for high-speed Ethernet output of data from sensors into the AI compute system. The result is unmatched performance for edge AI.

While traditional GStreamer and OpenGL-based endoscopy pipelines have an end-to-end latency of 220 ms on a 1080p 60 Hz stream, high-speed pipelines with Clara Holoscan boast an end-to-end latency of only 10 ms on a 4K 240 Hz stream.

Streaming data at 4K 60 Hz at under 50 ms on NVIDIA RTX A6000, teams can run 15 concurrent AI video streams and 30 concurrent models.

NVIDIA Rivermax SDK

The NVIDIA Rivermax SDK, included with NVIDIA Clara Holoscan, enables direct data transfers to and from the GPU. Bypassing host memory and using the offload capabilities of the ConnectX SmartNIC, it delivers best-in-class throughput and latency with minimal utilization for streaming workloads. NVIDIA Clara Holoscan leverages the Rivermax functionalities to bring scalable connectivity for high-bandwidth network sensors and support very fast data transfer.

NVIDIA G-SYNC

NVIDIA G-SYNC enables high display performance by synchronizing display refresh rates to the GPU, thus eliminating screen tearing and minimizing display stutter and input lag. As a result, the AI inference can be shown with very low latency.

NVIDIA Clara HoloViz

Clara HoloViz is a module in Holoscan for visualizing data. Clara HoloViz composites real-time streams of frames with multiple different other layers like segmentation mask layers, geometry layers, and GUI layers.

For maximum performance, Clara HoloViz makes use of Vulkan, which is already installed as part of the NVIDIA driver.

Clara HoloViz uses the concept of the immediate mode design pattern for its API. No objects are created and stored by the application. This makes it easy to quickly build and change the visualization in an Holoscan application.

Improved developer experience

The NVIDIA Clara Holoscan SDK v0.3 release brings significant improvements to the development experience. First, the addition of a new C++ API for the creation of GXF extensions gives developers an additional pathway to building their desired applications. Second, the support for x86 processors allows developers to quickly get started with developing AI applications which can then be easily deployed on IGX development kits. Third, the Bring Your Own Model (BYOM) support has been enriched in this latest version.

Holoscan C++ API

The Holoscan C++ API provides a new convenient way to compose GXF workflows, without the need to write YAML files. The Holoscan C++ API enables a more flexible and scalable approach to creating applications. It has been designed for use as a drop-in replacement for the GXF Framework’s API and provides a common interface for GXF components.

Diagram showing the main components of the Holoscan API. — *Figure 1. The main components of the Holoscan API*

Application: An application acquires and processes streaming data. An application is a collection of fragments where each fragment can be allocated to execute on a physical node of a Holoscan cluster.

Fragment: A fragment is a building block of the application. It is a directed acyclic graph (DAG) of operators. A fragment can be assigned to a physical node of a Holoscan cluster during execution. The run-time execution manages communication across fragments. In a fragment, operators (graph nodes) are connected to each other by flows (graph edges).

Operator: An operator is the most basic unit of work in this framework. An operator receives streaming data at an input port, processes it, and publishes it to one of its output ports. A codelet in GXF would be replaced with an operator in the framework. An operator encapsulates receivers and transmitters of a GXF entity as I/O ports of the operator.

Resource: Resources such as system memory or a GPU memory pool that an operator needs to perform its job. Resources are allocated during the initialization phase of the application. The resource matches the semantics of the GXF Memory Allocator or any other components derived from the component class in GXF.

Condition: A condition is a predicate that can be evaluated at runtime to determine if an operator should execute. This matches the semantics of the GXF Scheduling Term class.

Port: An interaction point between two operators. Operators ingest data at input ports and publish data at output ports. Receiver, transmitter, and MessageRouter in GXF are replaced with the concept of I/O port of the operator.

Executor: An executor manages the execution of a fragment on a physical node. The framework provides a default executor that uses a GXF scheduler to execute an application.

You can find more information about the new C++ API in the SDK documentation. See an example of a full AI application for endoscopy tool tracking using the new C++ API in the public source code repository.

Support for x86 systems

The NVIDIA Clara Holoscan SDK has been designed with various hardware systems in mind. It supports using the SDK on x86 systems, in addition to the NVIDIA IGX DevKit and the Clara AGX DevKit. With x86 support, researchers and developers who do not have a DevKit can use the Holoscan SDK on their x86 machines to quickly build AI applications for medical devices.

Bring Your Own Model

The Holoscan SDK provides AI libraries and pretrained AI models to jump-start the timeline to build your own AI applications. You can also reference applications for endoscopy and ultrasound with Bring Your Own Model (BYOM) support.

As a developer, you can quickly build AI pipelines by dropping your own models in the reference applications provided as part of the SDK. Finally, the SDK also includes sensor I/O integration options and performance tools that optimize AI applications for production deployment.

Software stack updates

The NVIDIA Clara Holoscan SDK v0.3 release also integrates an upgrade from NVIDIA JetPack HP1 to Holopack 1.1, running the Tegra Board Support Package (BSP) version 34.1.3, as well as an upgrade for GXF from version 2.4.2 to version 2.4.3.

Get started building AI for medical devices

From training AI models to verifying and validating AI applications and ultimately deploying for commercial production, Clara Holoscan helps streamline AI development and deployment.

Visit the Clara Holoscan SDK web page to access healthcare-specific acceleration libraries, pretrained AI models, sample applications, documentation, and more to get started building software-defined medical devices.

You can also request a free hands-on lab with NVIDIA LaunchPad to experience how Clara Holoscan simplifies the development of AI pipelines for endoscopy and ultrasound.