Category: Misc

Misc

Accelerating AI Development Pipelines for Industrial Inspection with the NVIDIA TAO Transfer Learning Toolkit

Post author By
Post date August 6, 2021
No Comments on Accelerating AI Development Pipelines for Industrial Inspection with the NVIDIA TAO Transfer Learning Toolkit

There is an increasing demand for manufacturers to achieve high-quality control standards in their production processes. Traditionally, manufacturers have relied on manual inspection to guarantee product quality. However, manual inspection is expensive, often only covers a small sample of production, and ultimately results in production bottlenecks, lowered productivity, and reduced efficiency. By automating defect inspection … Continued

By automating defect inspection with AI and computer vision, manufacturers can revolutionize their quality control processes. However, one major obstacle stands between manufacturers and full automation. Building an AI system and production-ready application is hard and typically requires a skilled AI team to train and fine-tune the model. The average manufacturer does not employ this expertise and resorts to using manual inspection.

The goal of this project was to show how the NVIDIA Transfer Learning Toolkit (TLT) and a pretrained model can be used to quickly build more accurate quality control in the manufacturing process. This project was done without an army of AI specialists or data scientists. To see how effective the NVIDIA TLT is in training an AI system for commercial quality-control purposes, a publicly available dataset on the steel welding process was used to retrain a pretrained ResNet-18 model, from the NGC catalog, a GPU-optimized hub of AI and HPC software, using TLT. We compared the effort and resulting model’s accuracy to a model built from scratch on the dataset in a previously published work by a team of AI researchers.

NVIDIA TLT is user-friendly and fast, and can be easily used by engineers who do not have AI expertise. We observed that NVIDIA TLT was faster to set up and produced more accurate results, posting a macro average F1 score of 97% compared to 78% from previously published “built from scratch” work on the dataset.

This post explores how NVIDIA TLT can quickly and accurately train AI models, showing how AI and transfer learning can transform how image and video analysis and industrial processes are deployed.

Workflow with NVIDIA TLT

NVIDIA TLT, a core component of the NVIDIA Train, Adapt, and Optimize (TAO) platform, follows the zero-coding paradigm tofast-track AI development. TLT comes with a set of ready-to-use Jupyter notebooks, Python scripts, and configuration specifications with default parameter values that enable you to start training and fine-tuning your datasets quickly and easily.

To get started with the NVIDIA TLT, we followed these Quick Start Guide instructions.

We downloaded a Docker container and TLT Jupyter notebook.
We mapped our dataset onto the Docker container.
We started our first training, having adjusted the default training parameters, such as the network structure, network size, optimizer, and so on, until we were satisfied with the results.

For more information about the configuration options for training, see Model Config in the NVIDIA TLT User Guide.

Dataset

The dataset used in this project was created by researchers at the University of Birmingham for their paper, Automated defect classification of SS304 TIG welding process using visible spectrum camera and machine learning.

The dataset consists of over 45K grayscale welding images, which can be obtained through Kaggle. The dataset describes one class of proper execution: good_weld. It has five classes of defects that can occur during a tungsten inert gas (TIG) welding process: burn_through, contamination, lack_of_fusion, lack_of_shielding_gas, and high_travel_speed.

Photos show welding images, including examples of burn through, contamination, lack of shielding, high travel speed, lack of fusion, and a good weld. — *Figure 1. Sample welding images from the training dataset. Source:* *Bacioiu et. al, 2019*.

	burn_through	contamination	good_weld	high_travel_speed	lack_of_fusion	lack_of_shielding	Total
train	977	1613	15752	630	5036	196	24204
valid	646	339	6700	346	1561	102	9694
test	731	960	7628	249	1490	102	11160

Table 1. Image distribution across the train, validation, and test data set

Like many industrial datasets, this dataset is quite imbalanced as it can be difficult to collect data on defects that occur with a low likelihood. Table 1 shows the class distribution for the train, validation, and test datasets.

Figure 2 visualizes the imbalance in the test dataset. The test dataset contains 75x more images of good_weld than it does of lack_of_shielding.

Bar chart shows relative distribution of test dataset, with good_weld having 75x more images than other categories. — *Figure 2. Class distribution of the test dataset on TIG steel welding*.

Using NVIDIA TLT

The approach taken focuses on minimizing both development time and tuning time while ensuring that the accuracy is applicable for a production environment. TLT was used in combination with the standard configuration files shipped with the example notebooks. The setup, training, and tuning was done in under 8 hours.

We conducted parameter sweeps regarding network depth and the number of training epochs. We observed that changing the learning rate from its default did not improve the results, so we did not investigate this further and left it at the default. The best results were obtained with a pretrained ResNet-18 model obtained from the NGC catalog after 30 epochs of training with a learning rate of 0.006.

Check out the step-by-step approach in krygol/304SteelWeldingClassification GitHub repo.

Metric	Precision	Recall	F1-score
burn_through	0.98	0.86	0.92
contamination	0.99	0.93	0.96
good_weld	0.99	1.00	0.99
high_travel_speed	1.00	1.00	1.00
lack_of_fusion	0.95	1.00	0.98
lack_of_shielding_gas	1.00	0.99	1.00
micro average	0.98	0.98	0.98
macro average	0.98	0.96	0.97

Table 2. Results attained with a pretrained ResNet-18 after training for 30 epochs with a learning rate of 0.006.

The obtained results were comparably good across all classes. Some lack_of_fusion gas images got misclassified as burn_through and contamination images. This effect was also observed when training the deeper ResNet50, which was even more prone to misclassifying lack_of_fusion as another defective class.

Comparison to the original approach

The researchers at the University of Birmingham chose a different AI workflow. They manually prepared the dataset to lessen the imbalance by undersampling it. They also rescaled their images to different sizes and chose custom network structures.

They used a fully connected neural network (Fully-con6), neural network with two hidden layers. They also implemented a convolutional neural network (Conv6) with three convolutional layers each followed by a max pooling layer and a fully connected layer as the final hidden layer. They did not use skip connections as ResNet does.

The results obtained with TLT are even more impressive when compared to results of the custom implementation by the researchers at the University of Birmingham.

	Conv6	Fully-con6	TLT-ResNet-18
Metric	F1-score	F1-score	F1-score
good_weld	0.99	0.97	0.99
burn_through	0.98	0.17	0.92
contamination	0.90	0.79	0.96
lack_of_fusion	0.94	0.94	0.97
lack_of_shielding	0.0	0.38	1.00
high_travel_speed	0.87	0.12	1.00
Macro average	0.78	0.56	0.97

Table 3. Comparison of the custom networks and the TLT ResNet-18.

Conv6 performed better on average with a macro average F1 of 0.78 but failed completely at recognizing lack_of_shielding gas defects. Fully-con6 performed worse on average, with a macro average F1 of 0.56. Fully-con6 could classify some of the lack_of_shielding gas images but had problems with burn_through and high_travel_speed images. Both Fully-con6 and Conv6 had distinct weaknesses that would prohibit them from being qualified as production-ready.

The best F1-scores for every class are marked in green in the table. As you can see, the ResNet-18 model trained by TLT provided better results with a macro average of 0.97.

Conclusion

We had a great experience with TLT, which in general was user-friendly and effective. It was fast to set up, easy to use, and produced results acceptable for production within a short computational time. Based on our experience, we believe that TLT provides a great advantage for engineers who are not AI experts but wish to use AI in their production environments. Using TLT in a manufacturing environment to automate quality control does not come at a performance cost and the application can often be used with the default settings and a few minor tweaks to outperform custom architectures.

This exploration of using NVIDIA TLT to quickly and accurately train AI models shows that there is great potential for AI in industrial processes.

Video. Project walkthrough (Source: Boston Virtual HPC Roadshow)

Acknowledgments

Thanks and gratitude to the researchers—Daniel Bacioiu, Geoff Melton, Mayorkinos Papaelias, and Rob Shaw—for their paper, Automated defect classification of SS304 TIG welding process using visible spectrum camera and machine learning, and publishing their data on Kaggle. Thanks to Ekaterina Sirazitdinova for her help during the development process of this project.

Misc

Cattle-ist for the Future: Plainsight Revolutionizes Livestock Management with AI

Post author By
Post date August 6, 2021
No Comments on Cattle-ist for the Future: Plainsight Revolutionizes Livestock Management with AI

Computer vision and edge AI are looking beyond the pasture. Plainsight, a San Francisco-based startup and NVIDIA Metropolis partner, is helping the meat processing industry improve its operations — from farms to forks. By pairing Plainsight’s vision AI platform and NVIDIA GPUs to develop video analytics applications, the company’s system performs precision livestock counting and Read article >

The post Cattle-ist for the Future: Plainsight Revolutionizes Livestock Management with AI appeared first on The Official NVIDIA Blog.

Misc

Accelerating Digital Pathology Pipelines with NVIDIA Clara Deploy

Post author By
Post date August 5, 2021
No Comments on Accelerating Digital Pathology Pipelines with NVIDIA Clara Deploy

The stain normalization filter accepts a source image and a target image and returns a normalized source image. As an undergraduate student excited about AI for healthcare applications, I was thrilled to be joining the NVIDIA Clara Deploy team for an internship. It was the perfect combination: the opportunity to work at a leading technology company enabling the acceleration and adoption of AI while contributing to a team building the future (and the … Continued The stain normalization filter accepts a source image and a target image and returns a normalized source image.

As an undergraduate student excited about AI for healthcare applications, I was thrilled to be joining the NVIDIA Clara Deploy team for an internship. It was the perfect combination: the opportunity to work at a leading technology company enabling the acceleration and adoption of AI while contributing to a team building the future (and the present!) of AI deployment for healthcare. The next few months were filled with learning from brilliant yet humble colleagues, picking up new skills like CUDA programming, and the opportunity to focus on unique technical challenges posed by histopathology data.

What is Clara Deploy?

The Clara Deploy SDK is a container-based, cloud-native development and deployment framework for multi-AI and multidomain workflows in smart hospitals. It enables you to define container-based pipelines consisting of multiple stages, each stage defined by an operator. A pipeline consists of multiple operators and is a directed acyclic graph (DAG) from the data source to the data sink. Each operator is a step of the pipeline, such as loading input, preprocessing, AI inference, and so on.

As I explored setting up the NVIDIA Clara Deploy platform and running AI inference pipelines, I gained firsthand experience in the challenges of deploying AI workflows, particularly in standardizing workflows and scaling up execution. While running digital pathology pipelines, I gained awareness of the performance bottleneck of I/O and preprocessing steps that are usually not GPU-accelerated. This influenced my choice to focus on accelerating preprocessing filters for digital pathology during my internship.

What is cuCIM?

cuCIM is a RAPIDS library for accelerated n-dimensional image processing and image I/O, with a focus on medical imaging applications. cuCIM consists of I/O, file system, and operation modules. Operations in cuCIM can be extended using a plug-in architecture. cuCIM is uniquely positioned to be a leading library for medical image-processing applications, and I am excited to have gained exposure to and contributed to it during my time at NVIDIA.

Project motivation

A significant challenge in the digitization of histopathology analysis is the stain variation observed in pathology images. These images can have large variations in staining caused by multiple factors, including stain vendors, storage conditions, staining protocols, digital scanners, and so on.

Given the range of factors, it is impractical to control for staining variation during image acquisition. Instead, an image preprocessing step called stain normalization is often used to algorithmically standardize image staining. A stain normalization filter accepts as input a source image and a target image. The source image is to be stain normalized, and the target image contains the ideal stain, to be transferred to the source image. Ultimately, a normalized source image is returned as output.

The stain normalization filter accepts a source image and a target image and returns a normalized source image. — *Figure 1. Stain normalization filter. Images from* StainTools.

Prior work has shown that stain normalization used as a preprocessing step in digital pathology AI pipelines can shorten training time, improve accuracy, and enable data from different sources to be used together. Because you are operating in a relatively small data regime due to the scarcity of stained pathology images, stain normalization enables you to optimize the signal obtained amidst noisy stain variations.

However, prior implementations of stain normalization were relatively slow as they were not GPU-accelerated. There was an opportunity to implement a GPU-accelerated stain normalization algorithm and enable fast and effective preprocessing for digital pathology AI pipelines.

Accelerating stain normalization for digital pathology

Stain normalization methods fall into three broad categories:

Global color normalization
Color normalization after stain deconvolution
Color transfer using deep networks

For more information, see Stain Color Adaptive Normalization (SCAN) algorithm: Separation and standardization of histological stains in digital pathology.

I chose to focus on stain deconvolution-based methods, as prior literature showed greater performance compared to global color normalization and better theoretical guarantees regarding the maintenance of biological structure integrity compared to deep network-based methods.

Stain deconvolution-based methods assume that each image is characterized by a stain matrix, which contains the red, green, blue (RGB) values for each of the two stains in H&E stained images: hematoxylin and eosin.

Using the Beer-Lambert law, an RGB image is transformed into an optical density image. Then, the optical density image may be related to the product of a pixel concentration matrix and the stain matrix for that image. The pixel concentration matrix indicates the concentration of each stain for each pixel. If the stain matrix is estimated, done here with the Macenko method, then the concentration matrix may be obtained.

Finally, for stain normalization, the stain matrix of a source image is replaced with the stain matrix of a target image. This serves the purpose of transferring the stain profile from the target image to the source image. Because the concentration matrix of the source image is unchanged, the morphology of the biological structures is maintained. The Macenko method for estimating the stain matrix is an unsupervised method using the singular value decomposition.

Stain deconvolution-based methods replace a source image’s stain matrix with that of a target image. — *Figure 2. Stain deconvolution-based stain normalization*

I designed and implemented a filter for the Macenko method for stain normalization in CuPy, after modifying an existing version in NumPy. Next, I compared the performance of the two.

Figure 3 shows the relative performance of the NumPy and CuPy implementations of stain normalization for different image sizes, using an NVIDIA DGX-1. Performance for the CuPy implementation is plotted in terms of acceleration factor relative to the NumPy implementation.

The CuPy implementation shows a significant performance boost over the NumPy implementation. For an image of size 4000 pixels, the CuPy implementation shows a 88x performance gain. — *Figure 3. Performance comparison of NumPy vs. CuPy implementations of stain normalization*

Given the goal of enabling GPU-accelerated stain normalization to be used as a preprocessing step for digital pathology pipelines, I began the integration of this filter as a transform (array-based and dictionary-based) into MONAI. MONAI is an open-source, PyTorch-based framework for deep learning in medical imaging. After being fully integrated, the stain normalization transform can be added to pathology pipelines in Clara Train or MONAI.

Acceleration of color conversion filter

Next, I worked on implementing the color conversion rgb2hed function in CUDA C++, which is a commonly used function available in scikit-image and the cuCIM Python layer, among other libraries. Color space conversion from RGB to HED is closely related to stain normalization, as this function involves obtaining stain concentration values, assuming that the stain vectors are a constant, precalculated approximation. This ignores variations between the staining of different images. This function is to be integrated into cuCIM through a C++ based operator plugin mechanism.

I compared the performance of a pure C++ implementation and the CUDA C++ implementation. Figure 4 shows the relative performance of the two versions, for different image sizes, using an NVIDIA GV100 GPU and Intel(R) Core(TM) i7-7800X CPU. Performance for the CUDA C++ implementation is plotted in terms of acceleration factor relative to the pure C++ implementation.

It’s important to note that the performance gains do not account for any transfer of data to and from the GPU. I did this because I am considering the common scenario where data transfers are minimized by remaining on the GPU for several subsequent operations in an image processing workflow, with transfer back to the host occurring only at the end.

Conclusion

In summary, my internship project was focused on accelerating color conversion filters for digital pathology. Specifically, I worked on designing and implementing the Macenko stain normalization method, using CuPy for GPU-acceleration. I began the integration of this into MONAI as a transform, for future use as a preprocessing step for digital pathology pipelines. Next, I worked on implementing the color conversion rgb2hed function in CUDA C++, to be integrated into cuCIM through a C++ based operator plugin mechanism.

Both the CuPy implementation of Macenko stain normalization and the CUDA C++ implementation of the rgb2hed function showed significant performance gains compared to the NumPy version and pure C++ version, respectively. The stain normalization preprocessing time for training a pipeline over 500 epochs with a dataset of 250 images and image size of 4000 by 4000 pixels is roughly estimated at 13 days with the NumPy-based filter. It decreases to 3.5 hours for the CuPy-based filter.

Ultimately, accelerating pre– and post-processing filters for digital pathology can improve the performance of deep learning pipelines in digital pathology, expedite the adoption of digital pathology, and enable AI to revolutionize pathology.

Misc

Can I start training with a batch of data, stop training, load different data (same type) and then start training again (same model)?

I am using a jupyter notebook to load data and train a CNN. I have about 80,000 images and each time I try to load the data, the VS code instance crashes. I was wondering if I could load the first 20,000 images (since I know that will work) and train the network, then delete those images and load the next 20,000 and start training again and so on until I use all my images. From my understanding, as long as I do not reset the kernel (of the notebook) I should be fine. Please help/give suggestions on how to avoid/fix this issue.

submitted by /u/NameError-undefined
[visit reddit] [comments]

Misc

Archaeologist Digs Into Photogrammetry, Creates 3D Models With NVIDIA Technology

Post author By
Post date August 5, 2021
No Comments on Archaeologist Digs Into Photogrammetry, Creates 3D Models With NVIDIA Technology

Archaeologist Daria Dabal is bringing the past to life, with an assist from NVIDIA technology. Dabal works on various archaeological sites in the U.K., conducting field and post-excavation work. Over the last five years, photogrammetry — the use of photographs to create fully textured 3D models — has become increasingly popular in archaeology. Dabal has Read article >

The post Archaeologist Digs Into Photogrammetry, Creates 3D Models With NVIDIA Technology appeared first on The Official NVIDIA Blog.

Misc

Ready for Prime Time: Plus to Deliver Autonomous Truck Systems Powered by NVIDIA DRIVE to Amazon

Post author By
Post date August 5, 2021
No Comments on Ready for Prime Time: Plus to Deliver Autonomous Truck Systems Powered by NVIDIA DRIVE to Amazon

Your Amazon Prime delivery just got smarter. Autonomous trucking company Plus recently signed a deal with Amazon to provide at least 1,000 self-driving systems to retrofit on the e-commerce giant’s delivery fleet. These systems are powered by NVIDIA DRIVE Xavier for high-performance, energy-efficient and centralized AI compute. The agreement follows Plus’ announcement of going public Read article >

The post Ready for Prime Time: Plus to Deliver Autonomous Truck Systems Powered by NVIDIA DRIVE to Amazon appeared first on The Official NVIDIA Blog.

Misc

August Arrivals: GFN Thursday Brings 34 Games to GeForce NOW This Month

Post author By
Post date August 5, 2021
No Comments on August Arrivals: GFN Thursday Brings 34 Games to GeForce NOW This Month

It’s a new month for GFN Thursday, which means a new month full of games on GeForce NOW. August brings a wealth of great new PC game launches to the cloud gaming service, including King’s Bounty II, Humankind and NARAKA: BLADEPOINT. In total, 13 titles are available to stream this week. They’re just a portion Read article >

The post August Arrivals: GFN Thursday Brings 34 Games to GeForce NOW This Month appeared first on The Official NVIDIA Blog.

Misc

On TensorFlow, how to use CNN on a stack of images

Post author By
Post date August 5, 2021
No Comments on On TensorFlow, how to use CNN on a stack of images

On TensorFlow, how to use CNN on a stack of images

submitted by /u/Striking-Warning9533
[visit reddit] [comments]

Misc

Getting Started on Mobile Image Recognition

Post author By
Post date August 5, 2021
No Comments on Getting Started on Mobile Image Recognition

I’m a novice on ML and mobile development topics, and I’d like to practice and make a basic app which could recognize a class of objects (animals, food, anything with freely available dataset) and returns information of said object.

I see dozens of articles and public repositories on implementing image recognition on mobile, mostly using Tensorflow, and I’ve found a lot of image datasets on Kaggle to train on.

Now I’m confused on how to actually start. I was thinking of using React Native since I’m a bit more experienced with Javascript and using npm packages, but a lot of articles say to just go native for better performance. I don’t know if what I’m doing would be considered “heavy” so I’m a bit confused here.

Any advice is appreciated!

submitted by /u/throwawayeksdee1
[visit reddit] [comments]

Misc

AI Detects Gravitational Waves Faster than Real Time

Post author By
Post date August 4, 2021
No Comments on AI Detects Gravitational Waves Faster than Real Time

Illustration of deep space, stars and the galaxy. New research creates a deployable AI framework for detecting gravitational waves within massive amounts of data at several magnitudes faster than real time.

Scientists searching the universe for gravitational waves just got a boost thanks to a new study and AI.

The research, recently published in Nature Astronomy, creates a deployable AI framework for detecting gravitational waves within massive amounts of data—at several magnitudes faster than real time.

Created by a group of scientists from Argonne National Laboratory, the University of Chicago, the University of Illinois at Urbana-Champaign, NVIDIA, and IBM, the work highlights how AI and supercomputing can accelerate reproducible, data-driven discoveries.

“As a computer scientist, what’s exciting to me about this project is that it shows how, with the right tools, AI methods can be integrated naturally into the workflows of scientists. Allowing them to do their work faster and better. Augmenting, not replacing, human intelligence,” study senior author Ian Foster, director of Argonne’s Data Science and Learning division said in a press release.

In 2015, the advanced Laser Interferometer Gravitational-Wave Observatory (LIGO) first detected gravitational waves when two black holes, 1.3 billion light-years away, collided and merged.

These waves occur when massive objects quickly accelerate (such as a star exploding or massive objects colliding) creating a ripple through space time.

The notable discovery confirmed part of Einstein’s theory of relativity, hypothesizing that space and time are linked. It also marked the start of gravitational wave astronomy, which could result in a deeper understanding of the cosmos, including dark energy, gravity, and neutron stars.

It also holds potential for scientists to step back through time to the moments around the Big Bang.

Since 2015, many more gravitational wave sources have been detected from LIGO. As the observatory continues with sensor upgrades and refinements, the expanse of detectors within the universe will also grow, creating large amounts of data for processing. Quickly computing these data streams remains key to gravitational wave astronomy advancements and discoveries.

In 2018 Eliu Huerta, lead for Translational AI and Computational Ccience at Argonne, demonstrated the capability of machine learning to detect gravitational waves from multiple LIGO detector data streams.

In this study, the researchers further refined the model, which uses the cuDNN-accelerated deep learning framework distributed over 64 NVIDIA GPUs. They tested the model against LIGO data from 2017 and found it accurately identified four binary black hole mergers—without any misclassifications. It also processed a month’s worth of data in under 7 minutes.

“In this study, we’ve used the combined power of AI and supercomputing to help solve timely and relevant big-data experiments. We are now making AI studies fully reproducible, not merely ascertaining whether AI may provide a novel solution to grand challenges,” Huerta said.

The team’s models are open-source and readily available.

Read the full article in Nature Astronomy >>
Read more >>