I have a problem using LSTM from keras. When I try to train the model, the training stops at “Epoch 1/50” and never progresses. The program just stops with a Process finished code and shows no error messages regarding the missing training.
The problem only occurs when I try to use LSTM’s default parameters. So if I for example give a new activation argument with such as “relu” then it works fine?
It seems to be on my local computer the problem pertains as the code can run on Colab with and without default parameters, but for some reason I can not be allowed to train the model at all. This is especially frustrating as there are no error messages displayed.
I really hope there is a skilled person who can help or guide me in the right direction with this problem.
Thanks 🙂
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, LSTM, InputLayer
2021-08-27 10:20:22.799484: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudart64_110.dll 2021-08-27 10:20:26.443509: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library nvcuda.dll 2021-08-27 10:20:26.488972: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: pciBusID: 0000:01:00.0 name: NVIDIA GeForce RTX 2060 computeCapability: 7.5 coreClock: 1.2GHz coreCount: 30 deviceMemorySize: 6.00GiB deviceMemoryBandwidth: 245.91GiB/s 2021-08-27 10:20:26.489580: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudart64_110.dll 2021-08-27 10:20:26.532650: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublas64_11.dll 2021-08-27 10:20:26.533109: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublasLt64_11.dll 2021-08-27 10:20:26.559639: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cufft64_10.dll 2021-08-27 10:20:26.565224: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library curand64_10.dll 2021-08-27 10:20:26.637251: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cusolver64_11.dll 2021-08-27 10:20:26.660612: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cusparse64_11.dll 2021-08-27 10:20:26.661815: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudnn64_8.dll 2021-08-27 10:20:26.662201: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0 2021-08-27 10:20:26.662770: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2 To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2021-08-27 10:20:26.664656: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: pciBusID: 0000:01:00.0 name: NVIDIA GeForce RTX 2060 computeCapability: 7.5 coreClock: 1.2GHz coreCount: 30 deviceMemorySize: 6.00GiB deviceMemoryBandwidth: 245.91GiB/s 2021-08-27 10:20:26.665660: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0 2021-08-27 10:20:27.284225: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix: 2021-08-27 10:20:27.284551: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264] 0 2021-08-27 10:20:27.284738: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0: N 2021-08-27 10:20:27.285152: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3961 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 2060, pci bus id: 0000:01:00.0, compute capability: 7.5) 2021-08-27 10:20:28.198186: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2) Epoch 1/50 2021-08-27 10:20:29.513381: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudnn64_8.dll Process finished with exit code -1073740791 (0xC0000409)
Healthcare giant Johnson & Johnson is injecting data science across its business to improve its manufacturing, clinical trial enrollment, forecasting and more. “I actually like to call it decision science,” said Jim Swanson, the company’s executive vice president and enterprise chief information officer, in a panel discussion at the most recent NVIDIA GPU Technology Conference. Read article >
NVIDIA Decision Support (NDS) is our adaptation of an industry-standard data science benchmark often used in the Apache Spark community. NDS consists of the same 105 SQL queries as the industry standard benchmark TPC-DS, but has modified parts for dataset generation and execution scripts.
Introduction
The August release (21.08) of RAPIDS Accelerator for Apache Spark is now available. It has been a little over a year since the first release at NVIDIA GTC 2020. We have improved in so many ways, particularly in terms of ease-of-use with minimal to no-code change for Apache Spark applications. This last year, the team has been focused on adding both functionality and continuously improving performance. As a testament to that, we periodically measure performance and functionality over time with the NVIDIA Data Science (NDS) benchmark at a scale factor of 3,000 (3 TB uncompressed). In this release, apart from adding new features, we are extremely proud to make progress on improving end-to-end speed for all passing queries and lowering the total cost of ownership for NVIDIA EGX servers.
Benchmark updates
NVIDIA Decision Support (NDS) is our adaptation of an industry-standard data science benchmark often used in the Apache Spark community. NDS consists of the same 105 SQL queries as the industry standard benchmark TPC-DS, but has modified parts for dataset generation and execution scripts. In our GTC 2021 update, we had 95 queries passing. With the 21.08 release, with new features such as out-of-core group by, window rank, and dense_rank, we have enabled all of the 105 queries to run on the GPU.
Benchmark setup
Scale Factor — 3K (3TB Dataset with floats)
Systems: 4x NVIDIA Certified EGX Server
EGX Server Hardware Spec: 4-node Dell R740xd, each with (2) 24-core CPUs, 512GB RAM, HDFS on NVMe, (1) CX-6 Dx 25/100Gb NIC, 2x NVIDIA A30 GPU
CPU Hardware Spec: 4-node Dell R740xd, each with (2) 24-core CPUs, 512GB RAM, HDFS on NVMe, (1) CX-6 Dx 25/100Gb NIC
Figure 1: NDS Queries Speed-up on EGX Servers: GPU vs CPU.
Based on this release, we are excited to show that all the 105 queries can now run without any code change on the GPU.
The benchmark servers used for these benchmarks cost little under $170,000 for four servers without GPUs and $220,000 to include one NVIDIA A100 GPU in each server.
In simple terms, benchmark GPU servers would cost 1.29 times CPU servers.
As shown by the chart above (figure 1), more than 95 queries are now 1.29x faster and thereby cheaper to run on GPU.
Some of the queries that are slower on GPU are currently being addressed and we are relentlessly working to improve those queries as well as improve the overall speed-ups.
Users can easily deduce that GPU speed-up varies from 1x to 18x and therefore it’s suggested that users qualify the right use cases for GPUs.
The Qualification Tool would be a handy asset if users are unsure about the right use case for GPU. For more information about the Qualification Tool, refer to the section below.
Profiling & qualification tool
The Profiling & Qualification tool, released in 21.06, saw positive feedback from the user community as well as requests for new features. In 21.08 the qualification tool now has the ability to handle event logs generated by Apache Spark 2.x versions. The tool will also support event logs generated by AWS EMR 6.3.0, Google Dataproc 2.0, Microsoft Azure Synapse, and the Databricks 7.3 and 8.2 runtimes. The qualification tool will no longer require a Spark runtime. Users can now use the qualification tool with just Apache Spark 3.x jars on their machine. The latest version also has new filtering capabilities to choose event logs. The tool also looks for read data formats and types that the plugin doesn’t support and removes these from the score (based on the total task time in SQL Dataframe operations). The output will be reported in a concise format on the terminal and a detailed analysis of each of the processed event logs will be stored as a csv output.
New functionality
This release adds more functionality for arrays and structs. We can now do a union on multi-level struct data types and can also write array data types in Parquet format. We have added rank and dense_rank window functions to the existing lead, lag and row_number functionality. With this added functionality, the RAPIDS Accelerator can now support the most commonly used window operators in SQL. For the timestamp operators, we have added support for LEGACY timestamps. With this functionality, users can read legacy timestamp formats supported in Spark 2.0. For Databricks users, we have added the ability to cache data in GPU (this was already supported for all other platforms).
We continue to make the user experience better with the ability to handle datasets that spill out of GPU memory for group by and windowing operations. This improvement will save users time creating partitions to avoid out-of-memory errors on the GPU. Similarly, the adoption of UCX 1.11 has improved error handling for RAPIDS Spark Accelerated Shuffle Manager.
As we noted in the last release, we moved to CalVer and a bi-monthly release cadence since the last release (21.06). The upcoming versions will add expanded support for additional decimal types and continue to add more nested data type support for multi-level struct and maps. In addition, lookout for micro-benchmarks with code-samples and notebooks that will highlight operations best suited for GPUs. We want to hear from you, the users. Reach out to us on GitHub and let us know how we can continue to improve your experience using RAPIDS Spark.
GFN Thursday is here to wake you up when September begins because there are a bunch of awesome day-and-date launch games coming to GeForce NOW this month. September brings 16 new day-and-date games to the cloud — including the anticipated Life is Strange: True Colors. They’re part of the 34 games being added throughout the Read article >
Has anyone using PyCharm found any luck installing the Tensorflow library? What guides did you follow? How can I approach installing and using the library? The traditional package installation via PyCharm doesn’t seem to work for me unfortunately. I’m using Python 3.8, trying to install versions 2.6.0
Is this possible? I wrote a python script that uses tensorflow object detection and I got everything working correctly but now I want to turn this into a .exe so that I can bring it on a flash drive easily and show it to friends.
PyTorch Lightning is a lightweight PyTorch wrapper for high-performance AI research. PyTorch code with Lightning enables seamless training on multiple-GPUs and uses best practices such as checkpointing, logging, sharding, and mixed precision. In this post, we walk you through building speech models with PyTorch Lightning on NVIDIA GPU-powered AWS instances managed by the Grid.ai platform.
AI is driving the fourth Industrial Revolution with machines that can hear, see, understand, analyze, and then make smart decisions at superhuman levels. However, the effectiveness of AI depends on the quality of the underlying models. So, whether you’re an academic researcher or a data scientist, you want to quickly build models with a variety of parameters and identify the most effective ones for your solutions.
In this post, we walk you through building speech models with PyTorch Lightning on NVIDIA GPU-powered AWS instances.
PyTorch Lightning + Grid.ai: Build models faster, at scale
PyTorch Lightning is a lightweight PyTorch wrapper for high-performance AI research. Organizing PyTorch code with Lightning enables seamless training on multiple GPUs, TPUs, CPUs, and the use of difficult to implement best practices such as checkpointing, logging, sharding, and mixed precision. A PyTorch Lightning container and developer environment is available on the NGC catalog.
Grid enables you to scale training from your laptop to the cloud without having to modify your code. Running on cloud providers such as AWS, Grid supports Lightning as well as all the classic machine learning frameworks such as Sci Kit, TensorFlow, Keras, PyTorch and more. With Grid, you can scale the training of models from the NGC catalog.
NGC: The hub for GPU-optimized AI software
TheNGC catalog is the hub for GPU-optimized software including AI/ML containers, pretrained models, and SDKs that can be easily deployed across on-premises, cloud, edge, and hybrid environments. NGC offers NVIDIA TAO Toolkit that enables retraining models with custom data and NVIDIA Triton Inference Server to run predictions on CPU and GPU-powered systems.
The rest of this post walks you through how to leverage models from the NGC catalog and the NVIDIA NeMo framework to train an automatic speech recognition (ASR) model with PyTorch Lightning using the following tutorial based on the ASR with NeMo tutorial.
Figure 1. AI model training process
Training NGC models with Grid sessions, PyTorch Lightning, and NVIDIA NeMo
ASR is the task of transcribing spoken language to text and is a critical component of Speech to Text systems. When training ASR models, your goal is to generate text from a given audio input that minimizes the word error rate (WER) metric on human transcribed speech. The NGC catalog contains state-of-the-art pretrained models for ASR.
In the remainder of this post, we show you how to use Grid sessions, NVIDIA NeMo, and PyTorch Lightning to fine-tune these models on the AN4 dataset.
The AN4 dataset, also known as the Alphanumeric dataset, was collected and published by Carnegie Mellon University. It consists of recordings of people spelling out addresses, names, telephone numbers, and so on, one letter or number at a time, as well as their corresponding transcripts.
Step 1: Create a Grid session optimized for Lightning and pretrained NGC models
Grid sessions run on the same hardware that you need to scale while providing you with preconfigured environments to iterate the research phase of the machine learning process faster than before. Sessions are linked to GitHub, loaded with JupyterHub, and can be accessed through SSH and your IDE of choice without having to do any setup yourself.
With sessions, you pay only for the compute that you need to get a baseline operational, and then you can scale your work to the cloud with Grid runs. Grid sessions are optimized for PyTorch Lightning and models hosted on the NGC catalog. They even provide specialized Spot pricing.
For an in-depth walkthrough, see the Grid Session tour (requires a Grid.ai account).
Figure 2. Workflow to create a Grid session
Step 2: Clone the ASR demo repo and open the tutorial notebook
Now that you have a developer environment optimized for PyTorch Lightning, the next step is to clone the NGC-Lightning-Grid-Workshop repo.
You can do this directly from a terminal in your Grid Session with the following command:
After you’ve cloned the repo, you can open up the notebook to use to fine-tune the NGC hosted model with NeMo and PyTorch Lightning.
Step 3: Install NeMo ASR dependencies
First, install all the session dependencies. Run tools such as PyTorch Lightning and NeMo and process the AN4 dataset to do this. Run the first cell in the tutorial notebook, which runs the following bash commands to install the dependencies.
## Install dependencies !pip install wget !sudo apt-get install sox libsndfile1 ffmpeg -y !pip install unidecode !pip install matplotlib>=3.3.2 ## Install NeMo BRANCH = 'main' !python -m pip install --user git+https://github.com/NVIDIA/NeMo.git@$BRANCH#egg=nemo_toolkit[all] ## Grab the config we'll use in this example !mkdir configs !wget -P configs/ https://raw.githubusercontent.com/NVIDIA/NeMo/$BRANCH/examples/asr/conf/config.yaml
Step 4: Convert and visualize the AN4 dataset
The AN4 dataset comes in raw Sof audio files, but most models process on mel spectrograms. Convert the Sof files to the Wav format so that you can use NeMo audio processing.
import librosa import IPython.display as ipd import glob import os import subprocess import tarfile import wget
# Download the dataset. This will take a few moments... print("******") if not os.path.exists(data_dir + '/an4_sphere.tar.gz'): an4_url = 'http://www.speech.cs.cmu.edu/databases/an4/an4_sphere.tar.gz' an4_path = wget.download(an4_url, data_dir) print(f"Dataset downloaded at: {an4_path}") else: print("Tarfile already exists.") an4_path = data_dir + '/an4_sphere.tar.gz'
if not os.path.exists(data_dir + '/an4/'): # Untar and convert .sph to .wav (using sox) tar = tarfile.open(an4_path) tar.extractall(path=data_dir)
print("Converting .sph to .wav...") sph_list = glob.glob(data_dir + '/an4/**/*.sph', recursive=True) for sph_path in sph_list: wav_path = sph_path[:-4] + '.wav' cmd = ["sox", sph_path, wav_path] subprocess.run(cmd) print("Finished conversion.n******") # Load and listen to the audio file example_file = data_dir + '/an4/wav/an4_clstk/mgah/cen2-mgah-b.wav' audio, sample_rate = librosa.load(example_file) ipd.Audio(example_file, rate=sample_rate)
You can then visualize the audio example as images of the audio waveform. Figure 3 shows the activity in the waveform that corresponds to each letter in the audio, as your speaker here enunciates quite clearly!
Figure 3. Audio waveform of the sample example
Each spoken letter has a different “shape.” It’s interesting to note that the last two blobs look relatively similar, which is expected because they are both the letter N.
Spectrograms
Modeling audio is easier in the context of frequencies of sound over time. You can get a better representation than this raw sequence of 57,330 values. A spectrogram is a good way of visualizing how the strengths of various frequencies in the audio vary over time. It is obtained by breaking up the signal into smaller, usually overlapping chunks, and performing a short-time Fourier transform (STFT) on each.
Figure 4 shows what the spectrogram of the sample looks like.
Figure 4. Audio spectrogram of the sample example
As in the earlier waveform, you see each letter being pronounced. How do you interpret these shapes and colors? Just as in the earlier waveform plot, you see time passing on the x-axis (all 2.6s of audio). However, now the y-axis represents different frequencies (on a log scale), and the color on the plot shows the strength of a frequency at a particular point in time.
Mel spectrograms
You’re still not done, as you can make one more potentially useful tweak by visualizing the data using the mel spectrogram. Change the frequency scale from linear (or logarithmic) to the mel scale, which better represents the pitches that are perceivable to the human ear. Mel spectrograms are intuitively useful for ASR. Because you are processing and transcribing human speech, mel spectrograms reduce background noise that can affect the model.
Figure 5. Mel spectrogram of the sample example
Step 5: Load and inference a pretrained QuartzNet model from NGC
Now that you’ve loaded and properly understood the AN4 dataset, look at how to use NGC to load an ASR model to be fine-tuned with PyTorch Lightning. NeMo’s ASR collection comes with many building blocks and even complete models that you can use for training and evaluation. Moreover, several models come with pretrained weights.
To model the data for this post, you use a Jasper architecture called QuartzNet from the NGC Model Hub. Jasper architecture consists of repeated block structures that uses 1D convolutions to model spectrogram data (Figure 6).
Figure 6. Jasper/QuartzNet model
QuartzNet is a better variant of Jasper with a key difference in that it uses time-channel separable 1D convolutions. This enables it to reduce the number of weights dramatically while keeping similar accuracy.
The following command downloads the pretrained QuartzNet15x5 model from the NGC catalog and instantiates it for you.
Because you are using this Lightning Trainer, you get some key advantages, such as model checkpointing and logging by default. You can also use 50+ best-practice tactics without needing to modify the model code, including multi-GPU training, model sharding, deep speed, quantization-aware training, early stopping, mixed precision, gradient clipping, and profiling.
Figure 7. Fine-tuning tactics
Step 7: Inference and deployment
Now that you have a baseline model, inference it.
Figure 9. Run inference
Step 8: Pause session
Now that you have trained the model, you can pause the session and all the files that you need are persisted.
Figure 9. Monitor Grid session
Paused sessions are free of charge and can be resumed as needed.
Conclusion
Now, you should have a better understanding of PyTorch Lightning, NGC, and Grid. You’ve fine-tuned your first NGC NeMo model and optimized it with Grid runs. We are excited to see what you do next with Grid and NGC.
Ray Tracing Games II is now available as a hardcover on Apress and Amazon.
Ray Tracing Games II is now available to download for free via Apress and Amazon and as a hardcover on Apress and Amazon as well. For those who love books as a physical medium, we recommend purchasing a copy for your home library, while also downloading the free PDF version for easy digital access on the go.
This Open Access book is a must-have for anyone interested in real-time rendering. Ray tracing is the holy grail of gaming graphics, simulating the physical behavior of light to bring real-time, cinematic-quality rendering to even the most visually intense games. Ray tracing is also a fundamental algorithm used for architecture applications, visualization, sound simulation, deep learning, and more.
We’ve collaborated with our partners to make four limited edition versions of the book, featuring custom covers that highlight real-time ray tracing in Fortnite, Control, Watch Dogs: Legion, and Quake II RTX.
Posted by Zaid Nabulsi, Software Engineer and Po-Hsuan Cameron Chen, Software Engineer, Google Health
The adoption of machine learning (ML) for medical imaging applications presents an exciting opportunity to improve the availability, latency, accuracy, and consistency of chest X-ray (CXR) image interpretation. Indeed, a plethora of algorithms have already been developed to detect specific conditions, such as lung cancer, tuberculosis and pneumothorax. By virtue of being trained to detect a specific disease, however, the utility of these algorithms may be limited in a general clinical setting, where a wide variety of abnormalities could surface. For example, a pneumothorax detector is not expected to highlight nodules suggestive of cancer, and a tuberculosis detector may not identify findings specific to pneumonia. Since an initial triaging step is to determine whether a CXR contains any concerning abnormalities, a general-purpose algorithm that identifies X-rays containing any sort of abnormality could significantly facilitate the workflow. However, developing a classifier to detect any abnormality is challenging due to the wide variety of abnormal findings that present on CXRs.
A Deep Learning System for Detecting Abnormal Chest X-rays The deep learning system we used is based on the EfficientNet-B7 architecture, pre-trained on ImageNet. We trained the model using over 200,000 de-identified CXRs from the Apollo Hospitals in India. Each CXR was assigned a label of either “normal” or “abnormal” using a regular expression–based natural language processing approach on the associated radiology reports.
To evaluate how well the system generalizes to new patient populations, we compared its performance on two datasets consisting of a wide spectrum of abnormalities: the test split from the Apollo Hospitals dataset (DS-1), and the publicly available ChestX-ray14 (CXR-14). The labels for these two test sets were annotated for the purposes of this project by a group of US board-certified radiologists. The system achieved areas under the receiver operating characteristic curve (AUROC) of 0.87 on DS-1 and 0.94 on CXR-14 (higher is better).
Though the evaluations on DS-1 and CXR-14 contained a wide range of abnormalities, a possible use-case would be to utilize such an abnormality detector in novel or unforeseen settings with diseases that it had not encountered before. To evaluate the generalizability of the system to new patient populations and in the presence of diseases not seen in the training set, we used four de-identified datasets from three countries, including two publicly available tuberculosis datasets and two COVID-19 datasets from Northwestern Medicine. The system achieved AUCs of 0.95-0.97 in detecting tuberculosis, and 0.65-0.68 in detecting COVID-19. Because CXRs that are negative for these diseases could still contain other concerning abnormalities, we further evaluated the system for its ability to detect abnormalities more broadly (instead of disease positive vs. negative), finding AUCs of 0.91-0.93 for the tuberculosis dataset, and AUCs of 0.86 for the COVID-19 dataset.
The purpose of multiple evaluations (abnormality detection and disease detection) is the distinction between the two: a given disease can present with a certain abnormality or not; and a certain abnormality can arise from multiple diseases. Our study evaluates for both.
<!–
AUCs for Three Evaluation Setups
1. General Abnormalities
2. Unseen disease:
Tuberculosis
3. Unseen disease:
COVID-19
Detect abnormalities
0.87-0.94
0.91-0.93
0.86
Detect respective disease
–
0.95-0.97
0.65-0.68
–>
The large drop in performance for COVID-19 is because many cases flagged by the system as “positive” for abnormalities were negative for COVID-19, but nevertheless contained abnormal CXR findings that needed attention. This further highlights the usefulness of abnormality detectors even if disease-specific models are available.
In addition, it’s important to note that there is a difference between generalization to unseen diseases (i.e., tuberculosis and COVID-19) versus generalization to unseen CXR findings (e.g., pleural effusion, consolidation/infiltrate). In this study, we demonstrated the generalizability of the system to unseen diseases but not necessarily unseen CXR findings.
Sample chest X-rays of true and false positives, and true and false negatives for (A) general abnormalities, (B) tuberculosis, and (C) COVID-19. On each CXR, we outline in red the areas on which the model focused to identify abnormalities (i.e., the class activation map), and outline the regions of interest indicated by a radiologist in yellow.
Potential Benefits in the Clinic To understand the potential utility of the deep learning model in improving clinical workflow, we simulated its use for case prioritization, where abnormal cases are “expedited” ahead of normal cases. In these simulations, the system reduced the turnaround time for abnormal cases by up to 28%. This reprioritization setup could be used to divert complex abnormal cases to cardiothoracic specialist radiologists, enable rapid triage of cases that may need urgent decisions, and provide the opportunity to batch negative CXRs for streamlined review.
Impact of a simulated deep learning model–based prioritization in comparison with random review order for (A) general abnormalities, (B) tuberculosis, and (C) COVID-19. The red bars indicate sequences of abnormal CXRs in red and normal CXRs in pink; a greater density of red towards the left indicates abnormal CXRs are reviewed sooner than normal ones. The histograms indicate the average improvement in turnaround time.
Additionally, we found that the system can be used as a pre-trained model to improve other ML algorithms for chest X-rays, especially when data is limited. For example, we used the normal/abnormal classifier in our recent study to detect pulmonary tuberculosis from chest X-rays. Abnormality and tuberculosis detectors can play a critical role in supporting early diagnosis in regions that lack access to resources like trained radiologists or molecular testing.
Sharing Improved Reference Standard Labels Much work remains to be done to realize the potential of ML to aid chest X-ray interpretation around the world. In particular, obtaining high-quality labels on de-identified data can be a significant barrier to developing and evaluating ML algorithms in healthcare. To accelerate these efforts, we are expanding upon our previous label release by releasing the labels used in this study for the publicly available ChestX-ray14 dataset. We look forward to future machine learning projects by the community in this space.
AcknowledgementsKey contributors to this project at Google include Zaid Nabulsi, Andrew Sellergren, Shahar Jamshy, Charles Lau, Eddie Santos, Atilla P. Kiraly, Wenxing Ye, Jie Yang, Rory Pilgrim, Sahar Kazemzadeh, Jin Yu, Greg S. Corrado, Lily Peng, Krish Eswaran, Daniel Tse, Neeral Beladia, Yun Liu, Po-Hsuan Cameron Chen, Shravya Shetty. Significant contributions and input were also made by radiologist collaborators Sreenivasa Raju Kalidindi, Mozziyar Etemadi, Florencia Garcia Vicente, David Melnick. For the CXR-14 dataset, we thank the NIH Clinical Center for making it publicly available. For tuberculosis data collection, thanks go to Sameer Antani, Stefan Jaeger, Sema Candemir, Zhiyun Xue, Alex Karargyris, George R. Thomas, Pu-Xuan Lu, Yi-Xiang Wang, Michael Bonifant, Ellan Kim, Sonia Qasba, and Jonathan Musco. The authors would also like to acknowledge many members of the Google Health Radiology and labeling software teams, in particular Shruthi Prabhakara, Scott McKinney, and Akib Uddin. Sincere appreciation also goes to the radiologists who enabled this work with their image interpretation and annotation efforts throughout the study; Jonny Wong for coordinating the imaging annotation work; Gavin Bee, Mikhail Fomitchev, Shabir Adeel, Jeff Bertram, and Benedict Noero for data releasing; David F. Steiner, Kunal Nagpal, and Michael D. Howell for providing feedback on the manuscript; Craig Mermel, Lauren Winer, Johnny Luu, Adrienne Welch, Annisah Um’rani, and Ashley Zlatinov for feedback on the blogpost.
1Labels include atelectasis, cardiomegaly, effusion, infiltration, mass, nodule, pneumonia, pneumothorax, consolidation, edema, emphysema, fibrosis, pleural thickening, hernia, other abnormality, and normal vs abnormal. ↩