DataBloom - Part 449

Multi-task Prediction of Organ Dysfunction in ICUs

Post author By
Post date July 22, 2021
No Comments on Multi-task Prediction of Organ Dysfunction in ICUs

Posted by Subhrajit Roy, Research Scientist and Diana Mincu, Research Software Engineer, Google Research

The intensive care unit (ICU) of a hospital looks after the most medically vulnerable patients, many of whom require organ support, such as mechanical ventilation or dialysis. While always critical, the demand on ICU services during the COVID-19 pandemic has further underscored the importance of data-driven decision-making in healthcare. Furthermore, the ability to accurately predict the clinical outcomes of ICU patients has the potential to guide therapy and may inform decisions about most effective care, including staffing and triage support.

Applying machine learning (ML) to electronic health records (EHRs) has shown promise in predicting clinical outcomes. However, many of these ML models are based on single-task learning (ST), where the models are trained only to predict a specific adverse event, such as an organ dysfunction or the need for a life-support intervention. Of greater benefit would be to train multi-task models, which take into account a variety of competing risks along with the interdependencies between organ systems that factor into patient outcomes in a realistic setting.

In “Multi-task prediction of organ dysfunction in the ICU using sequential sub-network routing”, we propose a multi-task learning (MTL) architecture, called Sequential Sub-Network Routing (SeqSNR), that better captures the complexity of a realistic setting. Inspired by a clinician’s holistic approach to diagnosing problems, SeqSNR is designed to use flexible parameter sharing and routing to find related tasks and encourage cross-learning between them. We successfully applied SeqSNR to the task of continuous adverse event prediction in an ICU setting and showed advantages over single-task and naïve multi-tasking, especially in low training data scenarios.

Data and Labels
In this study, we used the freely available, open access, de-identified MIMIC-III EHR dataset, which includes a patient cohort consisting of 36,498 adults across 52,038 critical care admissions at the Beth Israel Deaconess Medical Center between 2001 and 2012. Similar to our previous studies, we employed a version of the MIMIC-III dataset that was mapped to the Fast Healthcare Interoperability Resource (FHIR) standard and used a comprehensive set of features, including a sequence of vital signs, laboratory results, past medications, procedures, diagnoses, and more.

The MIMIC-III database contains multi-modal recordings from ICU patients. Unlike most datasets in ML, the input and targets are often not explicitly defined and must be inferred from the data. So, using a combination of automated rule-based methods and clinical review, we defined a suite of diverse endpoints, including critical care interventions, specific organ dysfunctions, and overall patient outcomes.

The task given to the model was to predict the onset of a selection of adverse events within 24–48 hours for every hour after a patient’s admission into the ICU. The defined adverse events included acute kidney injury (AKI), continuous renal replacement therapy (CRRT) dialysis, administration of vasopressors and inotropes, mechanical ventilation (MV), mortality, and remaining length of stay (LoS).

The SeqSNR Algorithm
While multi-task learning captures the interdependencies between organ systems and balances competing risks, it can be challenging to implement successfully. In practice, jointly-trained tasks often impair one another, an effect called “negative transfer”. The intuition behind SeqSNR was that modular ‘sub-networks’ would mitigate this issue by automatically optimizing how information is shared across multiple tasks.

SeqSNR is a time series adaptation of the SNR architecture and is a combination of a deep embedding layer followed by stacked recurrent neural network (RNN) layers. Modularisation is achieved by splitting both the embedding layer and the RNN stack into multiple modules connected by routing variables that are learned during the training phase. The routing connections are always created between blocks in one layer and the next. This approach minimizes negative transfer by ensuring that data of low relevance to a particular task layer is filtered out. In essence, this means that each task utilizes a different path through the model.

A high-level overview of the SeqSNR architecture.

Findings
SeqSNR shows a modest improvement in discriminative performance overall relative to single-task and naïve multitasking. However, it’s performance improvement is more significant in scenarios with few training labels.

Because the prevalence of different outcomes varied widely in the dataset (e.g. ~38% of patients had MV, but CRRT dialysis is present for only ~3%), many accuracy metrics are not suitable. Instead, we report the area under the precision recall curve (AU PRC), which is more reliable given imbalanced data. Moreover, we performed the Wilcoxon Signed Rank Tests to draw statistically significant conclusions for pairwise comparisons of ST learning, shared-bottom (SB) multi-task learning (i.e., naïve multi-task learning), and SeqSNR across bootstrapped samples from the held-out test set. The performance differences between the three architectures were modest, but SeqSNR outperformed both ST and SB in four out of six tasks (p-values are reported in the paper).

Comparison of single task (ST), shared bottom (SB) and SeqSNR performance on the MIMIC-III dataset.

Label Efficiency
We hypothesized that multi-task learning could assist in low-data scenarios by using easy-to-label auxiliary tasks to boost the performance of the main tasks. We formulated prediction tasks with only a portion of the training labels available for the primary prediction task, but kept the entire dataset for the “helper tasks”. The latter are chosen because they are reliably encoded in the EHR and are straightforward to timestamp. An example of such a helper task is length of stay, since the start and end of admissions are accurately timestamped in MIMIC-III. On the other hand, the start and end of mechanical ventilation events are not reliably timestamped. So, we defined a set of rules based on expert-defined heuristics to determine the ventilation times using multiple sources of mechanical ventilator–related settings along with physiological measurements in the EHR dataset that are indicative of MV.

The development of these rules for a new clinical endpoint was time-consuming and involved manual review of the dataset by experts. The difficulty in exhaustively labeling the dataset led us to test the model performance with only 1–10% of the data labeled, which resulted in a decline in model performance. The “helper tasks” are useful in this scenario since they are 100% labeled and can be used with the primary tasks (1–10% labeled) to jointly train the multi-task model for improved overall performance.

We chose AKI, mechanical ventilation, CRRT Dialysis, and vasoactive medications as primary endpoints using 1%, 5%, and 10% of the training labels, along with 100% of labels for the helper tasks — labs and vitals, mortality, and LoS. Performance of both ST and SeqSNR decreased as the percentage of labels for the primary endpoint was reduced, but SeqSNR outperformed ST across all tasks and all training data reduction percentages, with a statistically significant boost in performance for all cases.

Label efficiency results showing the discriminative performance when the training dataset for the primary endpoint is reduced to 1%, 5% and 10% while the helper tasks have access to all training labels.

This is a useful finding, given the difficulties of annotating endpoint labels in EHR datasets, which frequently necessitates human evaluation by doctors. The ability to use numerous endpoints, some of which may be easier to label (like duration of stay or mortality), could lessen the need for manual curation on more difficult endpoints that are annotated differently (like mechanical ventilation).

Subgroup Performance
While the version of the MIMIC-III dataset used contained labels for gender and age, it did not contain information on race and the information on ethnicity was limited. We computed the performance of all selected models across age and gender subgroups. We observed that in the scenarios with few instances in the dataset, the MTL models (both SB models and SeqSNR) often outperform ST. Even though there are exceptions, on average all models seem to be relatively balanced across age and gender subgroups. We invite the reader to refer to the supplemental section of our paper for a detailed performance breakdown.

Next Steps
This work is a proof of concept for SeqSNR on a set of canonical EHR prediction tasks. The code for this architecture is publicly available here. And will hopefully stimulate further research in EHR multi-tasking and other deep learning architectures inspired by clinical reasoning.

In future, it will be important to evaluate the performance of SeqSNR on different combinations of tasks, different time horizons and different datasets. One other area of potential growth in this project is to expand subgroup analysis by including datasets with additional population information, race, ethnicity, etc. Another area we are exploring is expanding subgroup analysis by including datasets with additional population information, such as race, ethnicity, etc. We also emphasize that these are prototype models designed to showcase methodologies, and more rigorous evaluation would be needed to bring these tools into deployment.

Acknowledgements
This work involved collaborative efforts from a multidisciplinary team of researchers, software engineers, clinicians, and cross-functional contributors. We thank our co-authors: Eric Loreaux, Anne Mottram, Ivan Protsyuk, Natalie Harris, Sebastien Baur, Yuan Xue, Jessica Schrouff, Ali Connell, Alan Karthikesalingam, Martin Seneviratne from Google, Nenad Tomasev from Deepmind, and Hugh Montgomery from University College London. We also thank Zhe Zhao from Google Research and Kathryn Rough, Cian Hughes, Megumi Morigami and Doris Wong from Google Health for their input and review, and the MIMIC team for curating this open access dataset for the research community.

Misc

On-Demand Session: Accelerating Kubernetes with NVIDIA Operators

Post author By
Post date July 22, 2021
No Comments on On-Demand Session: Accelerating Kubernetes with NVIDIA Operators

NVIDIA Operators streamline installing and managing GPUs and NICs on Kubernetes to make the software stack ready to run the most resource-demanding workloads, such as AI, ML, DL, and HPC, in the cloud, data center, and at the edge.

Kubernetes is an open-source container-orchestration system for automating computer application deployment, scaling, and management. It’s an extremely popular tool, and can be used for automated rollouts and rollbacks, horizontal scaling, storage orchestration, and more. For many organizations, Kubernetes is a key component to their infrastructure.

A critical step to installing and scaling Kubernetes is ensuring that it is properly utilizing the other components of the infrastructure. NVIDIA Operators streamline installing and managing GPUs and NICs on Kubernetes to make the software stack ready to run the most resource-demanding workloads, such as AI, ML, DL, and HPC, in the cloud, data center, and at the edge. NVIDIA Operators consist of the GPU Operator and the Network Operator, and are open source and based on the Operator Framework.

NVIDIA GPU Operator

The NVIDIA GPU Operator is packaged as a Helm Chart and installs and manages the lifecycle of software components so that the GPU-accelerated applications can be run on Kubernetes. The components are the GPU feature discovery, the NVIDIA Driver, the Kubernetes Device Plugin, the NVIDIA Container Toolkit, and DCGM Monitoring.

The GPU Operator enables infrastructure teams to manage the lifecycle of GPUs when used with Kubernetes at the Cluster level, therefore eliminating the need to manage each node individually. Previously infrastructure teams had to manage two operating system images, one for GPU nodes and one CPU nodes. When using the GPU Operator, infrastructure teams can use the CPU image with GPU worker nodes as well.

NVIDIA Network Operator

The Network Operator is responsible for automating the deployment and management of the host networking components in a Kubernetes cluster. It includes the Kubernetes Device Plugin, NVIDIA Driver, NVIDIA Peer Memory Driver, and the Multus, macvlan CNIs. These components were previously installed manually, but are automated through the Network Operator, streamlining the deployment process and enabling accelerated computing with enhanced customer experience.

Used independently or together, NVIDIA Operators simplify GPU and SmartNIC configurations on Kubernetes and are compatible with partner cloud platforms. To learn more about these components and how the NVIDIA Operators solve the key challenges to running AI, ML, DL, and HPC workloads and simplify initial setup and Day 2 operations, check out the on-demand webinar “Accelerating Kubernetes with NVIDIA Operators“.

Misc

GFN Thursday Slays with ‘Orcs Must Die! 3’ Coming to GeForce NOW

Post author By
Post date July 22, 2021
No Comments on GFN Thursday Slays with ‘Orcs Must Die! 3’ Coming to GeForce NOW

This GFN Thursday brings in hordes of fun — and a whole lot of orcs. Orcs Must Die! 3, the newest title from the action-packed, orc-slaying series from Robot Entertainment, is joining the GeForce NOW library when it releases tomorrow, Friday, July 23. In addition, 10 more games are coming to the service this Read article >

The post GFN Thursday Slays with ‘Orcs Must Die! 3’ Coming to GeForce NOW appeared first on The Official NVIDIA Blog.

Misc

Confidential multi-stakeholder AI with TensorFlow

Post author By
Post date July 22, 2021
No Comments on Confidential multi-stakeholder AI with TensorFlow

Learn how to up a confidential AI inference service. Using TensorFlow Serving, we’ll showcase a multi-stakeholder scenario including cloud service provider, model owner, inference service provider, and users.

Blog Post

GitHub

submitted by /u/m1gh7ym0
[visit reddit] [comments]

Misc

Question about Tutorial: Build, Train, and Deploy a Book Recommender System Using Keras, TensorFlow.js

Post author By
Post date July 22, 2021
No Comments on Question about Tutorial: Build, Train, and Deploy a Book Recommender System Using Keras, TensorFlow.js

Hi!

I stumbled upon this Medium article – Build, Train, and Deploy a Book Recommender System Using Keras, TensorFlow.js, Node.js, and Firebase – in order to get started with machine learning in general. One could say, I’m quite new to the field, so please excuse my stupid questions 😀

What would be a good way to continuously add user ratings to the model, instead of training it from scratch with the complete dataset every time there is an update? What I would like to avoid is having to fetch and process the complete dataset every time there are only a few new user ratings. Is this even possible?

Or, in other words: What are options for doing this in real time? Re-training the model from scratch every time a user rates something seems a bit of an overkill for whatever server that processes this data.

I would be very thankful for some insights and/or somebody pointing me in the right direction with this. Thanks!

submitted by /u/skizzoat
[visit reddit] [comments]

Misc

What to use in place of Estimators and other Session-less issues

Post author By
Post date July 21, 2021
No Comments on What to use in place of Estimators and other Session-less issues

I am trying to learn Tensorflow, but most of the courses and tutorials focus on the v1.Session style code. Specifically I am looking at the estimators. When consulting the Tensorflow documentation there is a big red notice:

Warning: Estimators are not recommended for new code. Estimators run

v1.Session

-style code which is more difficult to write correctly, and can behave unexpectedly, especially when combined with TF 2 code. Estimators do fall under compatibility guarantees, but will receive no fixes other than security vulnerabilities. See the migration guide for details.

Looking at the migration guide, they only mention estimators in the capacity that they are still compatible.

My question

What are we to use in place of the estimators?
A secondary question, where can I get some good training material on purely v2.5 stuff that outlines a “clean” way to code networks without Session or anything that is going to be imminently deprecated?

submitted by /u/LittleGremlinguy
[visit reddit] [comments]

Misc

Why are the ear keypoints in the wrong place in BlazePose?

Post author By
Post date July 21, 2021
No Comments on Why are the ear keypoints in the wrong place in BlazePose?

Example image

If you take a look at the above image and even on this real-time demo you’ll see that the ears keypoints are closer to the eyes than the ears.

Is this the expected result? Shouldn’t the ear keypoint be on the ear?

submitted by /u/whazaam
[visit reddit] [comments]

Misc

Predicting Protein Structures with Deep Learning

Post author By
Post date July 21, 2021
No Comments on Predicting Protein Structures with Deep Learning

Solving a mystery that stumped scientists for decades, last November a group of computational biologists from Alphabet’s DeepMind used AI to predict a protein’s structure from its amino acid sequence. Not even a year later, a new study offers a more powerful model, capable of computing protein structures in as little as 10 minutes, on … Continued

Not even a year later, a new study offers a more powerful model, capable of computing protein structures in as little as 10 minutes, on one gaming computer.

The research, from scientists at the University of Washington (UW), holds promise for faster drug development, which could unlock solutions for treating diseases like cancer.

Present in every cell in the body, proteins play a role in many processes such as blood clotting, hormone regulation, immune system response, vision, and cell and tissue repair. Made from long chains of amino acids that interact to form a folded three-dimensional structure, the shape of a protein determines its function.

Unfolded or misfolded proteins are also thought to cause degenerative disorders including cystic fibrosis, Alzheimer’s disease, Parkinson’s disease, and Huntington’s disease. Understanding and predicting how a protein structure develops could help scientists design effective interventions for many of these diseases.

The researchers at UW developed the RoseTTAFold model by creating a three-track neural network that simultaneously considers the sequence patterns, amino acid interaction, and possible three-dimensional structure of a protein.

To train the model, the team used discontinuous crops of protein segments, with 260 unique amino acid elements. With the cuDNN-accelerated PyTorch deep learning framework, and NVIDIA GeForce 2080 GPUs, this information flows back and forth within the deep learning model. The network is then able to deduce a protein’s chemical parts along with its folded structure.

“The end-to-end version of RoseTTAFold requires about 10 minutes on an RTX 2080 GPU to generate backbone coordinates for proteins with less than 400 residues. The pyRosetta version requires 5 minutes for network calculations on a single NVIDIA RTX 2080 GPU, and an hour for all-atom structure generation with 15 CPU cores,” the researchers write in the study.

Illustration of predicted protein structures — Predicted protein structures and their ground truth score. Credit: UW/Baek et al

The tool not only quickly predicts proteins, but can do so with limited input. It also has the ability to compute beyond simple structures, predicting complexes consisting of several proteins bound together. More complex models are computed in about 30 minutes on a 24G NVIDIA TITAN RTX.

A public server is available for anyone interested in submitting protein sequences. The source code is also freely available to the scientific community.

“In just the last month, over 4,500 proteins have been submitted to our new web server, and we have made the RoseTTAFold code available through the GitHub website. We hope this new tool will continue to benefit the entire research community,” said lead author Minkyung Baek, a postdoctoral scholar at the University of Washington, Institute for Protein Design.

Read more >>>
Read the full article in Science >>>

Misc

Accelerating Machine Learning Model Inference on Google Cloud Dataflow with NVIDIA GPUs

Post author By
Post date July 21, 2021
No Comments on Accelerating Machine Learning Model Inference on Google Cloud Dataflow with NVIDIA GPUs

Today, in partnership with NVIDIA, Google Cloud announced Dataflow is bringing GPUs to the world of big data processing to unlock new possibilities. With Dataflow GPU, users can now leverage the power of NVIDIA GPUs in their machine learning inference workflows. Here we show you how to access these performance benefits with BERT. Google Cloud’s Dataflow … Continued

Google Cloud’s Dataflow is a managed service for executing a wide variety of data processing patterns including both streaming and batch analytics. It has recently added GPU support can now accelerate machine learning inference workflows, which are running on Dataflow pipelines.

Please check out Google Cloud’s launch post for more exciting new features. In this post, we will showcase the performance benefits and TCO improvement with NVIDIA GPU acceleration by deploying a Bidirectional Encoder Representations from Transformers (BERT) model fine-tuned on “Question Answering” tasks on Dataflow. We show TensorFlow inference in Dataflow with CPU, how to run the same code on GPU with a significant performance boost, showcase the best performance after we convert the model through NVIDIA TensorRT, and deploy through TensorRT’s python API with Dataflow. Check out NVIDIA sample code to try now.

Overview of GCP Dataflow GPU support. — *Figure 1. Dataflow Architecture and GPU runtime.*

There are several steps we will be touching on in this post. We start by creating an environment on our local machine to run all of these Dataflow jobs. For additional details, please refer to the Dataflow Python quick start guide.

Creating an environment

It is recommended to create a virtual environment for Python, we use virtualenv here:

virtualenv -p

When using Dataflow, it is required to align the Python version in your development environment with the Dataflow runtime Python version. More specifically, when running a Dataflow pipeline, you should use the same Python version and Apache Beam SDK version to avoid unexpected errors.

Now, we activate the virtual environment.

source /bin/activate

One of the most important things to pay attention to before activating a virtual environment is to be sure that you are not operating in another virtual environment, as this usually causes issues.

After activating our virtual environment, we are ready to install the required packages. Even though our jobs are running on Dataflow, we still need a couple of packages locally so that Python does not complain when we run our code locally.

pip install apache-beam[gcp]
pip install TensorFlow==2.3.1

You can experiment with different versions of TensorFlow but the key here is to align the version you have here and the version that you will be using in the Dataflow environment. Apache Beam and its Google Cloud components are also required.

Getting the fine-tuned BERT model

NVIDIA NGC has plenty of resources ranging from GPU-optimized containers to fine-tuned models. We explore several NGC resources.

The first resource we will be using is a BERT large model that is fine-tuned for the SquadV2 question answering task and contains 340 million parameters. The following command will download the BERT model.

wget --content-disposition 
https://api.ngc.nvidia.com/v2/models/nvidia/bert_tf_savedmodel_large_qa_squad2_amp_384/versions/19.03.0/zip -O bert_tf_savedmodel_large_qa_squad2_amp_384_19.03.0.zip

With the BERT model we just downloaded, automatic mixed precision (AMP) is used during training and the sequence length is 384.

We also need a vocabulary file and we get it from a BERT checkpoint that can be obtained from NGC with the following command:

wget --content-disposition 
https://api.ngc.nvidia.com/v2/models/nvidia/bert_tf_ckpt_large_qa_squad2_amp_128/versions/19.03.1/zip -O bert_tf_ckpt_large_qa_squad2_amp_128_19.03.1.zip

After getting these resources, we just need to uncompress them and locate them in our working folder. We will be using a custom docker container and these models will be included in our image.

Custom Dockerfile

We will be using a custom Dockerfile that is derived from a GPU-optimized NGC TensorFlow container. NGC TensorFlow (TF) containers are the best option when accelerating TF models using NVIDIA GPUs.

We then add a couple of more steps to copy these models and the files we have. You can find the Dockerfile here and below is a snapshot of the Dockerfile.

FROM nvcr.io/nvidia/tensorflow:20.11-tf2-py3
RUN pip install --no-cache-dir apache-beam[gcp]==2.26.0 ipython pytest pandas && 
    mkdir -p /workspace/tf_beam
COPY --from=apache/beam_python3.6_sdk:2.26.0 /opt/apache/beam /opt/apache/beam
ADD. /workspace/tf_beam
WORKDIR /workspace/tf_beam
ENTRYPOINT [ "/opt/apache/beam/boot"]

The next steps are to build the docker file and push it to the Google Container Registry (GCR). You can do this with the following command. Alternatively, you can use the script we created here. If you are using the script from our repo, you can simply do bash build_and_push.sh

project_id=""
docker build . -t "gcr.io/${project_id}/tf-dataflow-${USER}:latest"
docker push "gcr.io/${project_id}/tf-dataflow-${USER}:latest"

Running jobs

If you have already authenticated your Google account, you can simply run the Python files we provided here by calling the run_cpu.sh and run_gpu.sh scripts are available in the same repo.

CPU TensorFlow Inference in Dataflow (TF-CPU)

The bert_squad2_qa_cpu.py file in the repo is designed to answer questions based on a description text document. The batch size is 16, meaning that we will be answering 16 questions at each inference call and there are 16,000 questions (1,000 batches of questions). Note that BERT could be fine-tuned for other tasks given a specific use case.

When running a job on Dataflow, by default it auto-scales based on real-time CPU usage. If you want to disable this feature you need to set autoscaling_algorithm to NONE. This will let you pick how many workers to use throughout the life of your job. Alternatively, you can let Dataflow auto-scale your job and limit the maximum number of workers to be used by setting the max_num_workers parameter.

We recommend setting a job name rather than using the auto-generated name to better follow your jobs by setting the job_name parameter. This job name will be the prefix for the compute instance that is running your job.

Accelerating with GPU (TF-GPU)

To execute the same dataflow TensorFlow inference job with GPU support, we need to set the following parameters. For additional information, please refer to Dataflow GPU documentation. For additional information, please refer to Dataflow GPU documentation.

--experiment "worker_accelerator=type:nvidia-tesla-t4;count:1;install-nvidia-driver"

The parameter preceding enables us to have an NVIDIA T4 Tensor Core GPU attached to the Dataflow worker VM, which is also visible as a Compute VM instance running our job. Dataflow will automatically install required NVIDIA drivers that support CUDA11.

The bert_squad2_qa_gpu.py file is almost the same as the bert_squad2_qa_cpu.py file. This means that with very little to no changes we can have our jobs running using NVIDIA GPUs. In our examples, we have a couple of additional GPU setups such as setting the memory growth with the code below.

physical_devices = tf.config.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(physical_devices[0], True)

Inference with NVIDIA optimized libraries

NVIDIA TensorRT optimizes Deep Learning models for inference and provides low latency and high throughput (for more information). Here, we use the NVIDIA TensorRT optimization to BERT model and use it to answer questions on a Dataflow pipeline with GPU at the speed of light. Users could follow the TensorRT demo BERT github repository.

We also use Polygraphy, which is a high-level python API for TensorRT to load the TensorRT engine file and run inference. In Dataflow code, the TensorRT model is encapsulated with a shared utility class, allowing all threads from a Dataflow worker process to make use of it.

Comparing CPU and GPU runs

In Table 10, we provided total run times and resources used for sample runs. The final cost for a Dataflow job is a linear combination of total vCPU time, total memory time, and total hard disk usage. For the GPU case, there is a GPU component as well.

Framework	Machine	Workers Count	Total execution time	Total vCPU time	Total memory time	Total HDD PD time	TCO Improvement
TF-CPU	n1-standard-8	2	2:46:00	43.5	163.13	1359.4	1x
TF-GPU	N1-standard-4 + T4	1	0:35:51	2.25	8.44	140.64	9.2x
TensorRT	N1-standard-4 + T4	1	0:09:51	0.53	1.99	33.09	38x

Table. Total run time and resource usage for sample TF-CPU, TF-GPU, and TensorRT runs.

Note that the table preceding is compiled based on a run and the exact number might slightly fluctuate but according to our experiments the ratios did not change much.

The total savings including the cost and run-time savings is more than 10x when accelerating our model with NVIDIA GPUs (TF-GPU) compared to using CPUs (TF-CPU). This means that when we use NVIDIA GPUs for inference on this task, we can have faster run times and lower costs compared to running your model using only CPUs.

With NVIDIA optimized inference libraries such as TensorRT, the user could run more complex and bigger models on GPU in Dataflow. TensorRT further accelerates the same job 3.6x faster compared to running it with TF-GPU, which yields 4.2x cost saving. Compare TensorRT with TF-CPU, we get 17x less execution time that provides around 38x less bill.

Summary

In this post, we compared TF-CPU, TF-GPU, and TensorRT inference performance for the question answering task running on Google Cloud Dataflow. Dataflow users can get great benefits by leveraging GPU workers and NVIDIA optimized libraries.

Accelerating deep learning model inference with NVIDIA GPUs and NVIDIA software is super easy. By adding or changing a couple of lines, we can run models using TF-GPU or TensorRT. We provided scripts and source files here and here for reference.

Acknowledgments

We would like to thank Shan Kulandaivel, Valentyn Tymofieiev, and Reza Rokni from the Google Cloud Dataflow team, and Jill Milton and Fraser Gardiner from NVIDIA for their support and invaluable feedback.

Misc

Learn More About Real-Time Ray Traced Caustics in Free Ray Tracing Gems II Chapter

Post author By
Post date July 21, 2021
No Comments on Learn More About Real-Time Ray Traced Caustics in Free Ray Tracing Gems II Chapter

We’ve been counting down to the release of Ray Tracing Gems II by providing early releases of select chapters once every week in July. This week’s chapter presents two real-time techniques for rendering caustics effects with ray tracing.

In just two weeks, on August 4, Ray Tracing Gems II will be available to download for free in its entirety, or to purchase as a physical release from Apress or Amazon. We’ve been counting down to this date by providing early releases of select chapters once every week in July. Today’s chapter presents two real-time techniques for rendering caustics effects with ray tracing. The first is built on an adaptive photon scattering approach that can depict accurate caustic patterns from metallic and transparent surfaces after multiple ray bounces. The second is specialized for water caustics cast after a single-bounce reflection or refraction and is appropriate for use on water surfaces that cover large areas of a scene. Both techniques are fully dynamic, low cost, and ready-to-use with no data preprocessing requirements.

You can download the full chapter free here.

We’ve collaborated with our partners to make four limited edition versions of the book, including custom covers that highlight real-time ray tracing in Fortnite, Control, Watch Dogs: Legion, and Quake II RTX.

To win a limited edition print copy of Ray Tracing Gems II, enter the giveaway contest here: https://developer.nvidia.com/ray-tracing-gems-ii