Categories
Misc

Accelerating scikit-image API with cuCIM: n-Dimensional Image Processing and I/O on GPUs

Overview cuCIM is a new RAPIDS library for accelerated n-dimensional image processing and image I/O. The project is now publicly available under a permissive license (Apache 2.0) and welcomes community contributions. In this post, we will cover the overall motivation behind the library and some initial benchmark results. In a complementary post on the Quansight … Continued

Overview

cuCIM is a new RAPIDS library for accelerated n-dimensional image processing and image I/O. The project is now publicly available under a permissive license (Apache 2.0) and welcomes community contributions. In this post, we will cover the overall motivation behind the library and some initial benchmark results. In a complementary post on the Quansight blog, we provide guidelines on how existing CPU-based scikit-image code can be ported to the GPU. The library’s initial release was a collaboration between Quansight and the RAPIDS and Clara teams at NVIDIA.

Motivation

A primary objective of cuCIM is to provide open-source implementations of a wide range of CUDA-accelerated n-dimensional image processing operations that closely mirror the scikit-image API. Volumetric or general n-dimensional data is encountered in scientific fields such as bioimaging (microscopy), medical imaging (CT/PET/MRI), materials science, and remote sensing. This familiar, Pythonic API provides an accessible interface allowing researchers and data scientists to rapidly port existing CPU-based codes to the GPU.

Although a number of proprietary and open-source libraries provide GPU-accelerated 2D image processing operations (e.g. OpenCV, CUVI, VPI, NPP, DALI), these either lack 3D support or focus on a narrower range of operations than cuCIM. Popular n-dimensional image processing tools like scikit-image, SciPy’s ndimage module, and the Image Processing Toolkit (ITK and SimpleITK) have either no or minimal GPU support. The CLIJ library is an OpenCL-based 2D and 3D image processing library with some overlap in functionality with cuCIM. Although CLIJ is a Java-based project being developed among the ImageJ/Fiji community, the effort is underway to provide interfaces from Python and other languages (clEsperanto).

Aside from providing image processing functions, we also provide image/data readers designed to address the I/O bottlenecks commonly encountered in Deep Learning training scenarios. Both C++ and Python APIs for reading TIFF files with an API matching the OpenSlide library are provided, with support for additional image formats planned for future releases (e.g. DICOM, NIFTI, and the Zarr-based Next Generation File Format being developed by the Open Microscopy Environment (OME).

cuCIM Architecture

C++ Architecture and plugins

cuCIM consists of I/O, file system, and operation modules at its core layers. Image formats and operations on images supported by cuCIM can be extended through plugins. cuCIM’s plug-in architecture is based on NVIDIA Carbonite SDK which is a core engine of NVIDIA Omniverse applications.

The CuImage C++ class represents an image and its metadata. This CuImage class and functions in core modules such as TIFF loader and filesystem I/O using NVIDIA GPUDirect Storage (GDS — also known as cuFile) are also available from Python via the cucim.clara package

Pythonic image processing utilizing CuPy

The GPU-based implementation of the scikit-image API is provided in the cucim.skimage module. These functions have been implemented using the CuPy library. CuPy was chosen because it provides a GPU equivalent for most of NumPy and a substantial subset of SciPy (FFTs, sparse matrices, n-dimensional image processing primitives). By using technologies such as Thrust and CUB, efficient, templated sorting and reduction routines are available as well. For cases where custom CUDA kernels are needed, it also contains ElementwiseKernel and RawKernel classes that can be used to simplify the generation of the necessary kernels at run-time for the provided input data types.

Interoperability

Finally, CuPy supports both DLPack and the __cuda_array_interface__ protocols, making it possible to easily interoperate with other Python libraries with GPU support as Numba, PyCUDA, PyTorch, Tensorflow, JAX, and Dask. Other libraries in the RAPIDS ecosystem provide complementary functions covering areas such as machine learning (cuML), signal processing (cuSignal), or graph algorithms (cuGraph) that can be combined with cuCIM in applications.  Additionally properties of ROIs (regions of interest) identified by an algorithm fit naturally in cuDF data-frames.

An example of using Dask to perform blockwise image processing was given in the following Dask deconvolution blog post, which illustrates distributed processing across multiple GPUs. Although the blog post was written prior to the release of cuCIM, the same approach of using Dask’s map_overlap function can be employed for many of the image processing operations in cuCIM. Blockwise processing (using map_blocks) can also be useful even with a single GPU in order to reduce peak GPU memory requirements.

cuCIM that can interact with other libraries and frameworks such as NumPy, PyTorch, CuPy, MONAI, DALI, and Albumentations.
Figure 1: Interoperability with Other Libraries/Frameworks.

Benchmarks

TIFF Image I/O

In the following figure, the relative performance for reading all non-overlapping (256, 256) patches from an RGB TIFF image of shape (81017, 92344, 3) on an SSD drive is shown. Performance is plotted in terms of acceleration factor relative to the OpenSlide library.

cuCIM's TIFF file loading performance whose performance gain is more than five times on average, compared with OpenSlide.
Figure 2: Performance – TIFF File Loading.

Additionally, reading raw binary data can see acceleration over standard cudaMemcpy when using GPUDirect Storage (GDS) on systems supporting it. GDS allows transfer directly from memory to the GPU, bypassing the CPU.

cuCIM's more than 25 percent performance gain of reading 2GB file with GPUDirect Storage.
Figure 3: Performance – Reading File with GPUDirect Storage (GDS).

n-dimensional B-spline image interpolation

The necessary n-dimensional image interpolation routines needed to enable resizing, rotation, affine transforms, and warping were contributed upstream to CuPy’s cupyx.scipy.ndimage module.  These implementations match the improved spline interpolation functions developed for SciPy 1.6 and support spline orders from 0-5 with a range of boundary conditions. These functions show excellent acceleration on the GPU, even for older generation gaming-class cards such as the NVIDIA GTX 1080-Ti.

cuCIM's excellent acceleration of spline interpolation functions with various spline orders ranging from 0 to 5.
Figure 4: Performance – N-dimensional B-spline Image Interpolation Routines.

Low-level operations

Image processing primitives such as filtering, morphology, color conversions, and labeling of connected components in binary images see substantial acceleration on the GPU. Performance numbers are in terms of acceleration factors relative to scikit-image 0.18.1 on the CPU.

Figure 5: Performance – Image Processing Primitives.

The results shown here are for relatively large images (2D images at 4K resolution). For smaller images such as (512, 512), acceleration factors will be smaller but still substantial. For tiny images (e.g. (32, 32)), processing on the CPU will be faster as there is some overhead involved in launching CUDA kernels on the GPU and there will not be enough work to keep all GPU cores busy.

Higher-level operations

More involved image restoration, registration, and segmentation algorithms also see substantial acceleration on the GPU.

cuCIM's substantial acceleration on advanced image processing algorithms including Richardson Lucy Deconvolution.
Figure 6: Performance – Advanced Image Processing Algorithms.

Contributing

cuCIM is an open-source project that welcomes community contributions!

Please visit Quansight’s blog to learn more about converting your CPU-based code to the GPU with cuCIM, and some areas on our roadmap where we would love to hear your feedback.

Categories
Misc

NVIDIA Partners with Boys & Girls Clubs of Western Pennsylvania on AI Pathways Program

Meet Paige Frank: Avid hoopster, Python coder and robotics enthusiast. Still in high school, the Pittsburgh sophomore is so hooked on AI and robotics, she’s already a mentor to other curious teens. “Honestly, I never was that interested in STEM. I wanted to be a hair stylist as a kid, which is also cool, but Read article >

The post NVIDIA Partners with Boys & Girls Clubs of Western Pennsylvania on AI Pathways Program appeared first on The Official NVIDIA Blog.

Categories
Misc

Help me understand yolo loss function.

submitted by /u/FunnyForWrongReason
[visit reddit] [comments]

Categories
Misc

AI agent plays Chrome Dino

AI agent plays Chrome Dino submitted by /u/1991viet
[visit reddit] [comments]
Categories
Misc

Asia’s Rising Star: VinAI Advances Innovation with Vietnam’s Most Powerful AI Supercomputer

A rising technology star in Southeast Asia just put a sparkle in its AI. Vingroup, Vietnam’s largest conglomerate, is installing the most powerful AI supercomputer in the region. The NVIDIA DGX SuperPOD will power VinAI Research, Vingroup’s machine-learning lab, in global initiatives that span autonomous vehicles, healthcare and consumer services. One of the lab’s most Read article >

The post Asia’s Rising Star: VinAI Advances Innovation with Vietnam’s Most Powerful AI Supercomputer appeared first on The Official NVIDIA Blog.

Categories
Misc

Universal Scene Description Key to Shared Metaverse, GTC Panelists Say

Artists and engineers, architects, and automakers are coming together around a new standard — born in the digital animation industry — that promises to weave all our virtual worlds together. That’s the conclusion of a group of panelists from a wide range of industries who gathered at NVIDIA GTC21 this week to talk about Pixar’s Read article >

The post Universal Scene Description Key to Shared Metaverse, GTC Panelists Say  appeared first on The Official NVIDIA Blog.

Categories
Misc

NVIDIA Unveils 50+ New, Updated AI Tools and Trainings for Developers

To help developers hone their craft, NVIDIA this week introduced more than 50 new and updated tools and training materials for data scientists, researchers, students and developers of all kinds. The offerings range from software development kits for conversational AI and ray tracing, to hands-on courses from the NVIDIA Deep Learning Institute. They’re available to Read article >

The post NVIDIA Unveils 50+ New, Updated AI Tools and Trainings for Developers appeared first on The Official NVIDIA Blog.

Categories
Misc

NVIDIA DRIVE Developer Days at GTC 2021 Now Open to the Entire Auto Industry

This year, everyone can learn how to develop safe, robust autonomous vehicles on NVIDIA DRIVE. The annual DRIVE Developer Days is taking place April 20-22 during GTC 2021, featuring a series of specialized sessions on autonomous vehicle hardware and software, including perception, mapping, simulation and more, all led by NVIDIA experts. And now, registration is … Continued

This year, everyone can learn how to develop safe, robust autonomous vehicles on NVIDIA DRIVE.

The annual DRIVE Developer Days is taking place April 20-22 during GTC 2021, featuring a series of specialized sessions on autonomous vehicle hardware and software, including perception, mapping, simulation and more, all led by NVIDIA experts. And now, registration is free and open to all.

In the past, these AV developer sessions have been limited to NVIDIA customers. However, this year we are making these deep dive sessions available to all developers, with lessons learned by the NVIDIA team, as the need for AV systems is more apparent than ever before.

The event kicks off with an overview of the latest NVIDIA DRIVE AGX autonomous vehicle platform from Gary Hicok, SVP of Hardware and Systems at NVIDIA.

The following sessions will cover the foundations of developing an AV on DRIVE, such as the DRIVE OS operating system and DriveWorks middleware, as well as the recently announced  NVIDIA DRIVE Sim, powered by Omniverse.

Other highlights include:

You can view the entire DRIVE Developer Day playlist by clicking here. Make sure you register now to access these and 150+ autonomous vehicle sessions at GTC 2021.

 

Categories
Misc

NASA and NVIDIA Collaborate to Accelerate Scientific Data Science Use Cases, Part 2

Over the past couple of years, NVIDIA and NASA have been working closely on accelerating data science workflows using RAPIDS, and integrating these GPU-accelerated libraries with scientific use cases. This is the second post in a series that will discuss the results from an air pollution monitoring use case conducted during the COVID-19 pandemic, and … Continued

Over the past couple of years, NVIDIA and NASA have been working closely on accelerating data science workflows using RAPIDS, and integrating these GPU-accelerated libraries with scientific use cases. This is the second post in a series that will discuss the results from an air pollution monitoring use case conducted during the COVID-19 pandemic, and share code snippets to port existing CPU workflows to RAPIDS on NVIDIA GPUs. This first post of this series, we covered Accelerated Simulation of Air Pollution.

Monitoring the Decline of Air Pollution Across the Globe During the COVID-19 Pandemic

Another air quality application leveraging XGBoost and RAPIDS is the live monitoring of air quality through the combination of surface monitoring data and near real-time model data produced by the NASA GEOS-CF model. This approach is particularly useful to detect and quantify air pollution anomalies, i.e., patterns in air quality observations that cannot be explained by the model. The most prominent (and extreme) example of this is the decline of air pollution in the wake of the COVID-19 pandemic. As a result of the stay-at-home orders, traffic emissions of air pollutants such as nitrogen dioxide (NO2) decreased significantly, as apparent from both satellite observations and surface monitoring data. However, exactly quantifying the impact of COVID-19 restrictions on surface air quality solely based on these atmospheric observations is very difficult given that many other factors impact surface air pollution, including weather, chemistry, or wildfires.

Video 1. Daily nitrogen dioxide (NO2) measurements and percentage difference from a baseline model. Credits: Christoph Keller and  NASA’s Scientific Visualization Studio.

The study conducted by Christoph and his colleagues fuses millions of observations – taken at 4,778 monitoring sites in 47 countries – with co-located model output produced by GEOS-CF. The sample code found at the covid_no2 repo, demonstrates the application for 10 selected cities (New York, Washington DC, San Francisco, Los Angeles, Beijing, Wuhan, London, Paris, Madrid, and Milan). The air quality observations for years 2018 through 2020 at these cities were obtained from the OpenAQ database (https://openaq.org/#/) and the European Environment Agency EEA (https://discomap.eea.europa.eu/map/fme/AirQualityExport.htm) and pre-processed into a single file for convenience:

Similarly, we preprocessed the GEOS-CF model output by subsampling the gridded native output (available in netCDF format at https://portal.nccs.nasa.gov/datashare/gmao/geos-cf/v1/das/) to the observation locations and saved the corresponding data as a table in text format that can be read similarly to the observation data:

The model data contains not only model predicted NO2 concentrations but also a number of ancillary model variables, such as information about the local weather and atmospheric composition (as taken from GEOS-CF) or calendar information.

The model data is then combined with the surface observation data to build an XGBoost bias-correction model that relates the model NO2 prediction to the observations. This is to account for the fact that the model prediction can be systematically different from the observations, e.g., because the model output represents the average over a 25×25 km2 domain while the surface observation is typically much more local in nature.

To train the XGBoost model, the model data is merged with the observations and the model bias is calculated from the merged data set to provide the label for the training (see sample code in repository mentioned above for full example):

Using the trained model, we can also calculate the SHAP values on GPU to analyze the factors that contribute most to the bias correction:

As shown in the figure below, the SHAP values for New York indicate that the most important predictors for the NO2 model bias (relative to the actual observation) is NO2 itself, followed by the hour of the day, wind speed (V10M) and the height of the planetary boundary layer (ZPBL). (note: to output the SHAP values in the example code the input argument shap needs to be set to 1, as well as gpu argument for accelerated code on the GPU).

Figure 2: Distribution of the 20 most important SHAP values for
 the bias correction model for New York City
.

After training the XGBoost model on the 2018 and 2019 data (using 8-fold cross-validation), we extend the NO2 bias correction to the model data produced for year 2020, resulting in a time series of the expected NO2 concentrations at a given observation site if there had been no mobility restrictions due to the pandemic (the NO2 ‘baseline’). The difference between the actual observations and these bias-corrected model predictions offers an estimate of the impact of COVID-19 restrictions on NO2 concentrations. The figure below shows the difference between observations and model predictions at New York City from Jan 2019 to Jan 2021. The solid green line shows the best estimate, defined as the 21-day rolling average across all four observation sites available for New York City, and the dark and light shaded areas show two uncertainty estimates derived from the time-averaged and hourly model-observation samples, respectively.

Throughout year 2019, the bias-corrected model mean estimate is in close agreement with the observations. Coinciding with the outbreak of the pandemic, the observed NO2 over New York City declines by up to 40% and only gradually recovers to the expected value by year end.

Difference between observed NO2 concentration and model-predicted values for New York City. Negative anomalies indicate a reduction in observed NO2 concentration relative to
the expected ‘business-as-usual' scenario.
Figure 3: Dark and light-shaded areas indicate low and high uncertainty estimates.

Conducting this analysis at 4,778 locations across the world enables us to identify regional patterns in air quality anomalies, and our study shows that these patterns tend to be related to differences in timing and intensity of COVID-19 restrictions.

The here described approach is not only useful to analyze the impact of COVID-19 on air pollution but can be generally used to monitor air pollution across the world (both the observations and model data are available in near real-time). Given the growing number of available air quality observations, fast data processing becomes ever more critical for such an application. The sample code available at https://github.com/GEOS-CF/covid_no2 demonstrates that conducting the analysis on a V100 GPU using cuDF offers an overall speed-up of up to 5x for, each city compared to 20-core Intel Xeon E5-2689 CPU.

References:

Keller, C. A., Evans, M. J., Knowland, K. E., Hasenkopf, C. A., Modekurty, S., Lucchesi, R. A., Oda, T., Franca, B. B., Mandarino, F. C., Díaz Suárez, M. V., Ryan, R. G., Fakes, L. H., and Pawson, S.: Global impact of COVID-19 restrictions on the surface concentrations of nitrogen dioxide and ozone, Atmos. Chem. Phys., 21, 3555–3592, https://doi.org/10.5194/acp-21-3555-2021, 2021.

NASA Model Reveals How Much COVID-related Pollution Levels Deviated from the Norm

A Data Science Series (Part 1): NASA and NVIDIA Collaborate to Accelerate Scientific Data Science Use Cases

Categories
Misc

Using feature columns on 3D data: How to avoid having 750+ input layers ?

Hello,

I’m trying to use Tensorflow to predict the outcome of a sport contest. What I have for every sample is the context of the competition (weather, type of stadium, …) and the competition history for every competing team.

Here is an overview of the data of every sample:

Context Teams History
CompetitionData [[CompetitionData], [CompetitionData]] (for every team, the past competition data and their result (win/lose/ranking)

I’m going to try to develop a Learning to Rank System, where given the context and every team history, predict the final ranking.

I think that feature columns are useful in this case, as they can ease the processing of the Competition Data. However, I can’t find a way to reuse the feature column code across all Competition data dimensions. The ideal would be to reuse the DenseFeatures layers across all competition data, but it doesn’t seems to work as tf requires the data to be of dict type to be fed to the Densefeature layers, which needs to be passed one by one trough an input layer to be correctly inputted.

I have also tried this:

history_input = tf.keras.layers.DenseFeatures([info_columns, info_columns2, info_columns3, age])

statics_hist = {
‘rapport’: Input((1,), dtype=tf.dtypes.int32, name=”rapport”),
‘weight’: Input((1,), dtype=tf.dtypes.int32, name=”weight”),
‘age’: Input((1,), name=”age”),
‘first’: Input((1,), dtype=tf.dtypes.int32, name=”first”),
‘stadium’: Input((1,), name=”stadiul”, dtype=tf.dtypes.string)
}
test = [stack([history_input(statics_hist) for _ in range(NB_RACE_HISTORY)], axis=1) for __ in range(MAX_NUMBER_PLAYERS)]
test = stack(test, axis=1)
But as I have 15 players with each 10 competition history of several columns, this gives me 750+ input layers, which can’t be the right way to go.

I have thought about flattening the data beforehand, but then I would lose the ability to run an LSTM trough a player history, which is important to modelize his current performance.

I’m not really sure of the right way to go, could anyone point me in the right direction ?

submitted by /u/Wats0ns
[visit reddit] [comments]