Categories
Offsites

Understanding Contextual Facial Expressions Across the Globe

It might seem reasonable to assume that people’s facial expressions are universal — so, for example, whether a person is from Brazil, India or Canada, their smile upon seeing close friends or their expression of awe at a fireworks display would look essentially the same. But is that really true? Is the association between these facial expressions and their relevant context across geographies indeed universal? What can similarities — or differences — between the situations where someone grins or frowns tell us about how people may be connected across different cultures?

Scientists seeking to answer these questions and to uncover the extent to which people are connected across cultures and geography often use survey-based studies that can rely heavily on local language, norms, and values. However, such studies are not scalable, and often end up with small sample sizes and inconsistent findings.

In contrast to survey-based studies, studying patterns of facial movement provides a more direct understanding of expressive behavior. But analyzing how facial expressions are actually used in everyday life would require researchers to go through millions of hours of real-world footage, which is too time-consuming to do manually. In addition, facial expressions and the contexts in which they are exhibited are complicated, requiring large sample sizes in order to make statistically sound conclusions. While existing studies have produced diverging answers to the question of the universality of facial expressions in given contexts, applying machine learning (ML) in order to appropriately scale the research has the potential to provide clarity.

In “Sixteen facial expressions occur in similar contexts worldwide”, published in Nature, we present research undertaken in collaboration with UC Berkeley to conduct the first large-scale worldwide analysis of how facial expressions are actually used in everyday life, leveraging deep neural networks (DNNs) to drastically scale up expression analysis in a responsible and thoughtful way. Using a dataset of six million publicly available videos across 144 countries, we analyze the contexts in which people use a variety of facial expressions and demonstrate that rich nuances in facial behavior — including subtle expressions — are used in similar social situations around the world.

A Deep Neural Network Measuring Facial Expression
Facial expressions are not static. If one were to examine a person’s expression instant by instant, what might at first appear to be “anger”, may instead end up being “awe”, “surprise” or “confusion”. The interpretation depends on the dynamics of a person’s face as their expression presents itself. The challenge in building a neural network to understand facial expressions, then, is that it must interpret the expression within its temporal context. Training such a system requires a large and diverse, cross-cultural dataset of videos with fully annotated expressions.

To build the dataset, skilled raters manually searched through a broad collection of publicly available videos to identify those likely to contain clips covering all of our pre-selected expression categories. To ensure that the videos matched the region they were assumed to represent, preference in video selection was given to those that included the geographic location of origin. The faces in the videos were then found using a deep convolutional neural network (CNN) — similar to the Google Cloud Face Detection API — that follows faces over the course of the clip using a method based on traditional optical flow. Using an interface similar to Google Crowdsource, annotators then labeled facial expressions across 28 distinct categories if present at any point during the clip. Because the goal was to sample how an average person would perceive an expression, the annotators were not coached or trained, nor were they provided examples or definitions of the target expressions. We discuss additional experiments to evaluate whether the model trained from these annotations was biased below.

Raters were presented videos with a single face highlighted for their attention. They observed the subject throughout the duration of the clip and annotated the facial expressions they exhibited. (source video)

The face detection algorithm established a sequence of locations of each face throughout the video. We then used a pre-trained Inception network to extract features representing the most salient aspects of facial expressions from the faces. The features were then fed into a long short-term memory (LSTM) network, a type of recurrent neural network that is able to model how a facial expression might evolve over time due to its ability to remember salient information from the past.

In order to ensure that the model was making consistent predictions across a range of demographic groups, we evaluated the model fairness on an existing dataset that was constructed using similar facial expression labels, targeting a subset of 16 expressions on which it exhibited the best performance.

The model’s performance was consistent across all of the demographic groups represented in the evaluation dataset, which provides supporting evidence that the model trained to annotated facial expressions is not measurably biased. The model’s annotations of those 16 facial expressions across 1,500 images can be explored here.

We modeled the selected face in each video by using a CNN to extract features from the face at each frame, which were then fed into an LSTM network to model the changes in the expression over time. (source video)

Measuring the Contexts Captured in Videos
To understand the context of facial expressions across millions of videos, we used DNNs that could capture the fine-grained content and automatically recognize the context. The first DNN modeled a combination of text features (title and description) associated with a video along with the actual visual content (video-topic model). In addition, we used a DNN that only relied on text features without any visual information (text-topic model). These models predict thousands of labels describing the videos. In our experiments these models were able to identify hundreds of unique contexts (e.g., wedding, sporting event, or fireworks) showcasing the diversity of the data we used for the analysis.

The Covariation Between Expressions and Contexts Around the World
In our first experiment, we analyzed 3 million public videos captured on mobile phones. We chose to focus on mobile uploads because they are more likely to contain natural expressions. We correlated the facial expressions that occurred in the videos to the context annotations derived from the video-topic model. We found 16 kinds of facial expressions had distinct associations with everyday social contexts that were consistent across the world. For instance, the expressions that people associate with amusement occurred more often in videos with practical jokes; expressions that people associate with awe, in videos with fireworks; and triumph, with sporting events. These results have strong implications for discussions about the relative importance of psychologically relevant context in facial expression, compared to other factors, such as those unique to an individual, culture, or society.

Our second experiment analyzed a separate set of 3 million videos, but this time we annotated the contexts with the text-topic model. The results verified that the findings in the first experiment were not driven by subtle influences of facial expressions in the video on the annotations of the video-topic model. In other words we used this experiment to verify our conclusions from the first experiment given the possibility that the video-topic model could implicitly be factoring in facial expressions when computing its content labels.

We correlated the expression and context annotations across all of the videos within each region. Each expression was found to have specific associations with different contexts that were preserved across 12 world regions. For example, here, in red, we can see that expressions people associate with awe were found more often in the context of fireworks, pets, and toys than in other contexts.

In both experiments, the correlations between expressions and contexts appeared to be well-preserved across cultures. To quantify exactly how similar the associations between expressions and contexts were across the 12 different world regions we studied, we computed second-order correlations between each pair of regions. These correlations identify the relationships between different expressions and contexts in each region and then compare them with other regions. We found that 70% of the context–expression associations found in each region are shared across the modern world.

Finally, we asked how many of the 16 kinds of facial expression we measured had distinct associations with different contexts that were preserved around the world. To do so, we applied a method called canonical correlations analysis, which showed that all 16 facial expressions had distinct associations that were preserved across the world.

Conclusions
We were able to examine the contexts in which facial expressions occur in everyday life across cultures at an unprecedented scale. Machine learning allowed us to analyze millions of videos across the world and discover evidence supporting hypotheses that facial expressions are preserved to a degree in similar contexts across cultures.

Our results also leave room for cultural differences. Although the correlations between facial expressions and contexts were 70% consistent around the world, they were up to 30% variable across regions. Neighboring world regions generally had more similar associations between facial expressions and contexts than distant world regions, indicating that the geographic spread of human culture may also play a role in the meanings of facial expressions.

This work shows that we can use machine learning to better understand ourselves and identify common communication elements across cultures. Tools such as DNNs give us the opportunity to provide vast amounts of diverse data in service of scientific discovery, enabling more confidence in the statistical conclusions. We hope our work provides a template for using the tools of machine learning in a responsible way and sparks more innovative research in other scientific domains.

Acknowledgements
Special thanks to our co-authors Dacher Keltner from UC Berkeley, along with Florian Schroff, Brendan Jou, and Hartwig Adam from Google Research. We are also grateful for additional support at Google provided by Laura Rapin, Reena Jana, Will Carter, Unni Nair, Christine Robson, Jen Gennai, Sourish Chaudhuri, Greg Corrado, Brian Eoff, Andrew Smart, Raine Serrano, Blaise Aguera y Arcas, Jay Yagnik, and Carson Mcneil.

Categories
Misc

On-Demand Technical Sessions: Develop and Deploy AI Solutions in the Cloud Using NVIDIA NGC

At GTC ’21, experts presented a variety of technical talks to help people new to AI, or those just looking for tools to speed-up their AI development using the various components of NGC.

At GTC ’21, experts presented a variety of technical talks to help people new to AI, or just those looking for tools to speed-up their AI development using the various components of the NGC catalog, including:

  • AI containers optimized to speed up AI/ML training and inference
  • Pretrained models that provide an advanced starting point to build custom models
  • Industry-specific AI SDKs that transform applications into AI-powered ones
  • Helm charts to provide consistent and faster deployments
  • Collections that bring together all the software needed for various use cases

Watch these on-demand sessions to learn how to build solutions in the cloud with NVIDIA AI software from NGC.

Building a Text-to-Speech Service that Sounds Like You

This session shows how to build a TTS model for expressive speech using pretrained models. The model is fine-tuned with speech samples and customized for the variability in speech performing style transfer from other speakers. The provided tools let developers create a model for their voice and style and make the TTS service sound like them!

Analyzing Traffic Video Streams at Scale

This session demonstrates how to use the Transfer Learning Toolkit and pretrained models to build computer vision models and run inference on over 1,000 live video feeds on a single AWS instance powered by NVIDIA A100 GPUs.

Deploy Compute and Software Resources to Run AI/ML Applications in Azure Machine Learning with Just Two Commands

This session shows how to building a taxi fare prediction application using RAPIDS and shows how to automatically set up a DASK cluster with multiple Azure virtual machines to support large datasets, mount data into the Dask scheduler and workers, deploy GPU-optimized AI software from the NGC catalog to train models, and then make taxi fare predictions.

Build and Deploy AI Applications Faster on Azure Machine Learning

This session demonstrates the basics of Azure Machine Learning (AzureML) Platform, the benefits of using the NGC catalog, and how to leverage the NGC-AzureML Quick Launch Toolkit to build an end-to-end AI application in AzureML.

If you’re building an AI solution from scratch or just want to replicate the use cases shown in the above sessions, start with the NGC catalog.

Categories
Misc

Generating High-Quality Labels for Speech Recognition with Label Studio and NVIDIA NeMo

Save time and produce a more accurate result when processing audio data with automated speech recognition (ASR) models from NVIDIA NeMo and Label Studio.

You can save time and produce a more accurate result when processing audio data with automated speech recognition (ASR) models from NVIDIA NeMo and Label Studio.

NVIDIA NeMo provides reusable neural modules that make it easy to create new neural network architectures, including prebuilt modules and ready-to-use models for ASR. With the power of NVIDIA NeMo, you can get audio transcriptions from the pretrained speech recognition models. Add Label Studio and its open-source data labeling capabilities to the mix and you can improve the transcription quality even further. 

Solution

Diagram showing audio data with speech and NeMo ASR models as inputs into Label Studio process with prelabeled data being reviewed and corrected, then exported in ASR manifest format.
Figure 1. ASR workflow with Label Studio and NeMo to annotate and correct transcripts.

Follow the steps in this post to set up NVIDIA NeMo ASR with Label Studio to produce high-quality audio transcripts.

  1. Connect the NVIDIA NeMo model to transcribe audio files in Label Studio automatically.
  2. Set up the audio transcription project.
  3. Validate and export revised audio transcripts from Label Studio.
  4. Fine-tune a NeMo ASR model with the revised audio transcripts from Label Studio.

Prerequisites

Before you start, make sure that you have the following resources:

  • Audio data files. This audio might be recordings of customer service calls, phone orders, sales conversations, or other recorded audio with people talking. The audio files must be in one of the following file formats:
    • WAV
    • AIFF
    • MP3
    • AU
    • FLAC
  • Label Studio installed. Install Label Studio using your preferred method on your local machine or a cloud server. For more information, see Quickstart in the Label Studio documentation.
  • NeMo toolkit installed

Free audio data

If you don’t have any audio data in mind, you can use an example dataset or a historical audio dataset:

There are a number of other ASR datasets that you can use. For more information, see Datasets — Introduction. You can also use public domain audio recording collections on the Library of Congress website, such as the Sports Byline collection of interviews with American baseball players.

After you identify the audio to transcribe, you can start processing it.

Install Label Studio ML backend

After you install Label Studio, install the Label Studio machine learning backend. From the command line, run the following command:

git clone https://github.com/heartexlabs/label-studio-ml-backend 

Set up the environment:

cd label-studio-ml-backend

# Install label-studio-ml and its dependencies
pip install -U -e .

# Install the nemo example dependencies
pip install -r label_studio_ml/examples/requirements.txt

Connect the NVIDIA NeMo model to transcribe audio files in Label Studio automatically

To prelabel the data with predictions from a pretrained ASR model, set up the NeMo toolkit as a machine learning backend in Label Studio. The Label Studio machine learning backend lets you use a pretrained model to prelabel your data.

Label Studio includes an example using the pretrained QuartzNet15x5 model developed with NeMo from the NGC cloud, but you can set up a different model with your data if another one is a better fit. For more information, see the list of ASR models available from NeMo.

From the command line, set up NeMo as a machine learning backend and start a new Label Studio project with the model.

  1. Install the NeMo toolkit in a Docker container or using pip.
  2. Download a NeMo ASR model. The provided Label Studio example script downloads the pretrained QuartzNet model from the NGC cloud. To use a different model, download that model from NGC.
  3. From the command line, start the Label Studio machine learning backend.
    label-studio-ml init my_model --from label_studio_ml/examples/nemo/asr.py
  4. Start the machine learning backend. By default, the model starts on localhost with port 9090.
    label-studio-ml start my_model
  5. Start Label Studio with the model.
    label-studio start my_project --ml-backends http://localhost:9090

Set up the audio transcription project

After you start Label Studio, import your audio data and set up the right template to configure labeling. The audio transcription template is the best one for automated speech recognition and makes it easy to annotate the audio data.

Open Label Studio, import your data, and select the template.

  1. Choose Import and import your audio data as plain text or JSON files referencing valid URLs for the audio files hosted in online storage such as Amazon S3. For more information, see Get data into Label Studio.
GIF showing how to import data into Label Studio. Content duplicated in surrounding text.
Figure 2. process of importing data into Label Studio..

2. From the Tasks list, choose Settings.
3. On the Labeling Interface tab, browse the templates and select the Automated Speech Recognition template.
4. Choose Save.

Validate and output the model predictions

As an annotator, review the tasks for the audio data on the task interface and validate. If necessary, correct the transcript predicted by the NeMo speech model.

  1. From the list of tasks in Label Studio, choose Label.
  2. For each audio sample, listen to the audio and review the transcription produced by the NeMo model as part of the prelabeling process.
  3. If any words in the transcript are incorrect, update them.
  4. Save the changes to the transcript. Choose Submit to submit the transcript and review the next audio sample.

Next, export the completed audio transcripts from Label Studio in the proper format expected by the NeMo model, as described in NeMo ASR collection in the NVIDIA NeMo documentation. 

To export the completed audio, do the following:

  1. From the list of tasks in Label Studio, choose Export.
  2. Select the audio transcript JSON format called ASR_MANIFEST.

For more information about the available export formats in Label Studio, see Export results from Label Studio.

Use high-quality transcripts to fine-tune your ML model

When you’re done processing the audio and adjusting the transcribed text, you’re left with audio transcripts that you can use to retrain the ASR models included in NeMo. Label Studio produces annotations that are fully compatible with NeMo training.

To update the QuartzNet model checkpoint, you can do it in a few lines of code, train the model from scratch, or use PyTorch Lightning. Examples are also available in the NeMo Jupyter notebook. For more information, see Transfer Learning in the ASR with NeMo Jupyter notebook.

By using Label Studio and NeMo together, you can save time processing each audio file from scratch. NeMo gives you a highly accurate prediction right away, and Label Studio helps make that prediction perfect. Try it today!

Categories
Misc

First-Hand Experience: Deep Learning Lets Amputee Control Prosthetic Hand, Video Games

Path-breaking work that translates an amputee’s thoughts into finger motions, and even commands in video games, holds open the possibility of humans controlling just about anything digital with their minds. Using GPUs, a group of researchers trained an AI neural decoder able to run on a compact, power-efficient NVIDIA Jetson Nano system on module (SOM) Read article >

The post First-Hand Experience: Deep Learning Lets Amputee Control Prosthetic Hand, Video Games appeared first on The Official NVIDIA Blog.

Categories
Misc

An End-to-End Blueprint for Customer Churn Modeling and Prediction-Part 3

This is the third installment in a series describing an end-to-end blueprint for predicting customer churn. In previous installments, we’ve discussed some of the challenges of machine learning systems that don’t appear until you get to production: in the first installment, we introduced our use case and described an accelerated data federation pipeline; in the … Continued

This is the third installment in a series describing an end-to-end blueprint for predicting customer churn. In previous installments, we’ve discussed some of the challenges of machine learning systems that don’t appear until you get to production: in the first installment, we introduced our use case and described an accelerated data federation pipeline; in the second installment, we showed how advanced analytics fits with the rest of the machine learning lifecycle.

In this third installment, we finish presenting the analytics and federation components of our application and explain some best practices for getting the most out of Apache Spark and the RAPIDS Accelerator for Apache Spark.

Architecture review

An architecture diagram showing a federation and analytics application that takes five database tables and produces one table and a set of reports, a model training application that takes the federated table and the reports and produces a model, and a production inference application that serves the model.
Figure 1: A high-level overview of our blueprint architecture.

Recall that our blueprint application (Figure 1) includes a federation workload and a pair of analytics workloads.

  • The federation workload produces a single denormalized wide table of data about each customer drawn from aggregating data spread across five normalized tables of observations related to different aspects of customers’ accounts. 
  • The first analytic workload produces a machine-readable summary report of value distributions and domains for each feature.
  • The second analytic workload produces a series of illustrative business reports about customer outcomes. Our first installment contains additional details about the federation workload and our second installment contains additional details about the analytics workloads.

We’ve implemented these three workloads as a single Spark application with multiple phases:

  • The app federates raw data from multiple tables in HDFS (which are stored as Parquet files) into a single wide table.
  • Because the wide table is substantially smaller than the raw data, the app then reformats the wide output by coalescing to fewer partitions and casting numeric values to types that will be suitable for ML model training. The output of this phase is the source data for ML model training.
  • The app then runs the analytics workloads against the coalesced and transformed wide table, first producing the machine-readable summary report and then producing a collection of rollup and data cube reports.

Performance considerations

Parallel execution

For over 50 years, one of the most important considerations for high performance in computer systems has been increasing the applicability of parallel execution. (We choose, somewhat arbitrarily, to identify the development of Tomasulo’s algorithm in 1967, which set the stage for ubiquitous superscalar processing, as the point at which concerns about parallelism became practical and not merely theoretical.) In the daily work of analysts, data scientists, data and ML engineers, and application developers, concerns about parallelism often manifest in one of a few ways; we’ll look at those now.

When scaling out, perform work on a cluster

If you’re using a scale-out framework, perform work on a cluster instead of on a single node whenever possible. In the case of Spark, this means executing code in Spark jobs on executors rather than in serial code on the driver.  In general, using Spark’s API rather than host-language code in the driver will get you most of the way there, but you’ll want to ensure that the Spark APIs you’re using are actually executing in parallel on executors.

Operate on collections, not elements; on columns, not rows

A general best practice to exploit parallelism and improve performance is to use specialized libraries that perform operations on a collection at a time rather than an element at a time. In the case of Spark, this means using data frames and columnar operations rather than iterating over records in partitions of RDDs; in the case of the Python data ecosystem and RAPIDS.ai, it means using vectorized operations that operate on entire arrays and matrices in a single library call rather than using explicit looping in Python. Crucially, both of these approaches are also amenable to GPU acceleration.

Amortize the cost of I/O and data loading

I/O and data loading are expensive, so it makes sense to amortize their cost across as many parallel operations as possible.  We can improve performance both by directly reducing the cost of data transfers and by doing as much as possible with data once it is loaded. In Spark, this means using columnar formats, filtering relations only once upon import from stable storage, and performing as much work as possible between I/O or shuffle operations.

Better performance through abstraction

In general, raising the level of abstraction that analysts and developers employ in apps, queries, and reports allows runtimes and frameworks to find opportunities for parallel execution that developers didn’t (or couldn’t) anticipate.

Use Spark’s data frames

As an example, there are many benefits to using data frames in Spark and primarily developing against the high-level data frame API, including faster execution, semantics-preserving optimization of queries, reduced demand on storage and I/O, and dramatically improved memory footprint relative to using RDD based code. But beyond even these benefits lies a deeper advantage: because the data frame interface is high-level and because Spark allows plug-ins to alter the behavior of the query optimizer, it is possible for the RAPIDS Accelerator for Apache Spark to replace certain data frame operations with equivalent — but substantially faster — operations running on the GPU.

Transparently accelerate Spark queries

Replacing some of the functionality of Spark’s query planner with a plug-in is a particularly compelling example of the power of abstraction: an application written years before it was possible to run Spark queries on GPUs could nevertheless take advantage of GPU acceleration by running it with Spark 3.1 and the RAPIDS Accelerator.

Maintain clear abstractions

While the potential to accelerate unmodified applications with new runtimes is a major advantage of developing against high-level abstractions, in practice, maintaining clear abstractions is rarely a higher priority for development teams than shipping working projects on time. For multiple reasons, details underlying abstractions often leak into production code; while this can introduce technical debt and have myriad engineering consequences, it can also limit the applicability of advanced runtimes to optimize programs that use abstractions cleanly.

Consider operations suitable for GPU acceleration

In order to get the most out of Spark in general, it makes sense to pay down technical debt in applications that work around Spark’s data frame abstraction (e.g., by implementing parts of queries as RDD operations).  In order to make the most of advanced infrastructure, though, it often makes sense to consider details about the execution environment without breaking abstractions.  To get the best possible performance from NVIDIA GPUs and the RAPIDS Accelerator for Apache Spark, start by ensuring that your code doesn’t work around abstractions, but then consider the types and operations that are more or less amenable to GPU execution so you can ensure that as much of your applications run on the GPU as possible. We’ll see some examples of these next.

Types and operations

Not every operation can be accelerated by the GPU. When in doubt, it always makes sense to run your job with spark.rapids.sql.explain set to NOT_ON_GPU and examine the explanations logged to standard output. In this section, we’ll call out a few common pitfalls, including decimal arithmetic and operations that require configuration for support.

Beware of decimal arithmetic

Decimal computer arithmetic supports precise operations up to a given precision limit, can avoid and detect overflow, and rounds numbers as humans would while performing pencil-and-paper calculations. While decimal arithmetic is an important part of many data processing systems (especially for financial data), it presents a particular challenge for analytics systems. In order to avoid overflow, the results of decimal operations must widen to include every possible result; in cases in which the result would be wider than a system-specific limit, the system must detect overflow. In the case of Spark on CPUs, this involves delegating operations to the BigDecimal class in the Java standard library and precision is limited to 38 decimal digits, or 128 bits. The RAPIDS Accelerator for Apache Spark can currently accelerate calculations on decimal values of up to 18 digits, or 64 bits.

We’ve evaluated two configurations of the churn blueprint: one using floating-point values for currency amounts (as we described in the first installment) and one using decimal values for currency amounts (which is the configuration that the performance numbers we’re currently reporting is running against). Because of its semantics and robustness, decimal arithmetic is more costly than floating-point arithmetic, but it can be accelerated by the RAPIDS Accelerator plugin as long as all of the decimal types involved fit within 64 bits.

Configure the RAPIDS Accelerator to enable more operations

The RAPIDS Accelerator is conservative about executing operations on the GPU that might exhibit poor performance or return slightly different results than their CPU-based counterparts. As a consequence, some operations that could be accelerated may not be accelerated by default, and many real-world applications will need to enable these to see the best possible performance. We saw an example of this phenomenon in our first installment, in which we had to explicitly enable floating-point aggregate operations in our Spark configuration by setting spark.rapids.sql.variableFloatAgg.enabled to true. Similarly, when we configured the workload to use decimal arithmetic, we needed to enable decimal acceleration by setting spark.rapids.sql.decimalType.enabled to true.

The plugin documentation lists operations that can be supported or not by configuration and the reasons why certain operations are enabled or disabled by default. In addition to floating-point aggregation and decimal support, there are several classes of operations that production Spark workloads are extremely likely to benefit from enabling:

  • Cast operations, especially from string to date or numeric types or from floating-point types to decimal types.
  • String uppercase and lowercase (e.g., “SELECT UPPER(name) FROM EMPLOYEES“) are not supported for some Unicode characters in which changing the case also changes the character width in bytes, but many applications do not use such characters. You can enable these operations individually or enable them and several others by setting spark.rapids.sql.incompatibleOps.enabled to true.
  • Reading specific types from CSV files; while reading CSV files is currently enabled by default in the plugin (spark.rapids.sql.format.csv.enabled), reading invalid values of some types (numeric types, dates, and decimals in particular) will have different behavior on the GPU and the CPU and thus reading each of these will need to be enabled individually.

Accelerate data ingest from CSV files

CSV reading warrants additional attention: it is expensive and accelerating it can improve the performance of many jobs. However, because the behavior of CSV reading under the RAPIDS Accelerator may diverge from Spark’s behavior while executing on CPUs and because of the huge dynamic range of real-world CSV file quality, it is particularly important to validate the results of reading CSV files on the GPU. One quick but valuable sanity check is to ensure that reading a CSV file on the GPU returns the same number of NULL values as reading the same file on the CPU. Of course, there are many benefits to using a self-documenting structured input format like Parquet or ORC instead of CSV if possible.

Avoid unintended consequences of query optimization

The RAPIDS Accelerator transforms a physical query plan to delegate certain operators to the GPU.  By the time Spark has generated a physical plan, though, it has already performed several transformations on the logical plan, which may involve reordering operations.  As a consequence, an operation near the end of a query or data frame operation as it was stated by the developer or analyst may get moved from a leaf of the query plan towards the root.

A diagram of a database query execution.  The first step shows joining two input relations; the second step shows the output of joining these two relations; the third shows the result of filtering the join output, producing in relatively few records.
Figure 2:  A depiction of executing a data frame query that joins two data frames and then filters the results.  If the predicate is sufficiently selective, most of the output tuples will be discarded.
A diagram of a database query execution.  The first step shows filtering the first input relation; the second step shows filtering the second input relation; and the third shows joining the results of filtering the two input relations, resulting in relatively few records.
Figure 3:  A depiction of executing a data frame query that filters two input relations before joining the results.  If the predicate can be evaluated on each input relation independently, this query execution produces the same results as the query execution in Figure 2 much more efficiently.

In general, this sort of transformation can improve performance.  As an example, consider a query that joins two data frames and then filters the results:  when possible, it will often be more efficient to execute the filter before executing the join.  Doing so will reduce the cardinality of the join, eliminate comparisons that will ultimately be unnecessary, decrease memory pressure, and potentially even reduce the number of data frame partitions that need to be considered in the join.  However, this sort of optimization can have counterintuitive consequences:  aggressive query reordering may negatively impact performance on the GPU if the operation that is moved towards the root of the query plan is only supported on CPU or if it generates a value of a type that is not supported on the GPU.  When this happens, a greater percentage of the query plan may execute on the CPU than is strictly necessary.   You can often work around this problem and improve performance by dividing a query into two parts that execute separately, thus forcing CPU-only operation near the leaves of a query plan to execute only after the accelerable parts of the original query run on the GPU.

Conclusion

In this third installment, we’ve detailed some practical considerations for getting the most out of Apache Spark and the RAPIDS Accelerator for Apache Spark.  Most teams will realize the greatest benefits by focusing on using Spark’s data frame abstractions cleanly.  However, some applications may benefit from minor tweaks, in particular semantics-preserving code changes that consider the RAPIDS Accelerator’s execution model and avoid unsupported operations.  Future installments will address the rest of the data science discovery workflow and the machine learning lifecycle.

Categories
Misc

Clara Train 4.0 Upgrades to MONAI and supports FL with Homomorphic Encryption

NVIDIA recently released Clara Train 4.0, an application framework for medical imaging that includes pre-trained models, AI-Assisted Annotation, AutoML, and Federated Learning. In this 4.0 release, there are three new features to help get you started training quicker.

NVIDIA recently released Clara Train 4.0, an application framework for medical imaging that includes pre-trained models, AI-Assisted Annotation, AutoML, and Federated Learning. In this 4.0 release, there are three new features to help get you started training quicker.  

Clara Train has upgraded its underlying infrastructure from TensorFlow to MONAI. MONAI is an open-source, PyTorch-based framework that provides domain-optimized foundational capabilities for healthcare. By leveraging MONAI, users now have access to a comprehensive list of medical image–specific transformations and reference networks. Clara Train has also updated its DeepGrow model to work on 3D CT images. This updated model gives you the ability to segment an organ in 3D with only a few clicks across the organ.  

Expanding into Digital Pathology, Clara Train helps users navigate these new workloads by providing a Digital Pathology pipeline that includes data loading and training optimizations. These data loading optimizations involve using the new cuCIM library included in RAPIDS.  

Clara Train 4.0 continues to improve on its Federated Learning framework by adding homomorphic encryption tools. Homomorphic encryption allows you to compute data while the data is still encrypted. It can play an important role in healthcare in ensuring that patient data stays secure at each hospital while still benefiting from using federated learning with other institutions. 

To learn more about these new features, check out our Clara Train 4.0 features highlight video below or the latest blog posts which includes a walkthrough of how you can Bring-Your-Own-Components to Clara Train.  

Download Clara Train 4.0 from NGC, and try out the newly updated Jupyter notebooks on GitHub

Categories
Misc

What Is Explainable AI?

Banks use AI to determine whether to extend credit, and how much, to customers. Radiology departments deploy AI to help distinguish between healthy tissue and tumors. And HR teams employ it to work out which of hundreds of resumes should be sent on to recruiters. These are just a few examples of how AI is Read article >

The post What Is Explainable AI? appeared first on The Official NVIDIA Blog.

Categories
Misc

Is it possible to create a deep neural network in tensorflow that feedbacks across multiple layers?

The regular RNN layers feedback to themselves. Instead, I want to create a network where I can connect a latter layer across multiple layers to a previous one.

Something like this image: https://imgur.com/GDscR4Z

I am having trouble finding anything like this. Wondering if it is even possible.

submitted by /u/Tahoma-sans
[visit reddit] [comments]

Categories
Misc

Is there a way to incorporate other information into a segmentation model?

Hey guys! So my research professor wants me to construct a notebook that uses image data alongside genetic data in segmentation. I’ve already constructed a segmentation model that runs fine, and I have all of the genetic data in a dataframe – I was wondering how I’d go about incorporating that into the model

submitted by /u/maruchaannoodles
[visit reddit] [comments]

Categories
Misc

TFLite 1.47x slower than SavedModel

I’m benchmarking a model in a controlled environment (docker container with 1 CPU and 4GB RAM).

Running 100 inferences on SATRN model with batch size 1 takes on average 1.26 seconds/inference using the TFLite model and 0.86 seconds /inference using the SavedModel.

Is it expected? What would explain the performance difference?

submitted by /u/BarboloBR
[visit reddit] [comments]