Categories
Misc

NVIDIA Accelerates Conversational AI from Research to Production with Latest Updates in NeMo and Jarvis

Today, NVIDIA released world class speech recognition capability for enterprises to generate highly accurate transcriptions and NeMo 1.0 which includes new state-of-the-art speech and language models for democratizing and accelerating conversational AI research.

Today, NVIDIA released world class speech recognition capability for enterprises to generate highly accurate transcriptions and NeMo 1.0 which includes new state-of-the-art speech and language models for democratizing and accelerating conversational AI research.

World Class Speech Recognition

Jarvis world class speech recognition is an out-of-the-box speech service that can be easily deployed in any cloud or datacenter. Enterprises can use Transfer Learning Toolkit (TLT) to customize speech service across a variety of industries and use cases.  With TLT, developers can accelerate development of custom speech and language models by 10x.  

The speech recognition model is highly accurate and trained on domain-agnostic vocabulary from telecommunications, finance, healthcare, education, and also various proprietary and open-source datasets. Additionally, it was trained on noisy data, multiple sampling rates including 8khz for call centers, variety of accents, and dialogue all of which contribute to the model’s accuracy. 

With Jarvis speech service, you can generate a transcription in under 10 milliseconds. It is evaluated on multiple proprietary datasets with over ninety percent accuracy and can be adapted to a wide variety of use cases and domains. It can be used in several apps such as transcribing audio in call centers, video conferencing and in virtual assistants.

T-Mobile, one of the largest telecommunication operators in the United States, used Jarvis to offer exceptional customer service.

“With NVIDIA Jarvis services, fine-tuned using T-Mobile data, we’re building products to help us resolve customer issues in real time,” said Matthew Davis, vice president of product and technology at T-Mobile. 

“After evaluating several automatic speech recognition solutions, T-Mobile has found Jarvis to deliver a quality model at extremely low latency, enabling experiences our customers love.”

You can download Jarvis speech service from the NGC Catalog to start building your own transcription application today. 

NeMo 1.0

NVIDIA NeMo is an open-source toolkit for researchers developing state-of-the-art (SOTA) conversational AI models. It includes collections for Automatic Speech Recognition (ASR), Natural Language Processing (NLP) and Text-to-Speech (TTS) which enables researchers to quickly experiment with new SOTA neural networks in order to create new models or build on top of existing ones.

 NeMo is tightly coupled with PyTorch, PyTorch Lightning and Hydra frameworks. These integrations enable researchers to develop and use NeMo models and modules in conjunction with PyTorch and PyTorch Lightning modules. Also, with the Hydra framework and NeMo, researchers can easily customize complex conversational AI models.

Highlights of this version include:

  • Added speech recognition support for multiple languages and also new CitriNet and Conformer-CTC ASR models
  • Bidirectional Neural Machine Translation models support in five languages from English to Spanish, Russian, Mandarin, German and French
  • New speech synthesis models such as Fastpitch, Talknet, Fastspeech2, and also end-to-end models like Fastpitch + HiFiGAN and Fastspeech2 + HiFiGAN
  • Features for automatically performing inverse text normalization and denormalization, and also for creating datasets based on CTC-Segmentation and exploring speech datasets

Also, most NeMo models can be exported to NVIDIA Jarvis for production deployment and high-performance inference. 

Learn more about what is included in NeMo 1.0 from the NVIDIA Developer Blog. NeMo is open-sourced and is available for download and use from the NGC Catalog and GitHub.

Categories
Misc

Accelerating Conversational AI Research with New Cutting-Edge Neural Networks and Features from NeMo 1.0

NVIDIA NeMo is a conversational AI toolkit built for researchers working on automatic speech recognition (ASR), natural language processing (NLP), and text-to-speech synthesis (TTS). The primary objective of NeMo is to help researchers from industry and academia to reuse prior work (code and pretrained models and make it easier to create new conversational AI models. … Continued

NVIDIA NeMo is a conversational AI toolkit built for researchers working on automatic speech recognition (ASR), natural language processing (NLP), and text-to-speech synthesis (TTS). The primary objective of NeMo is to help researchers from industry and academia to reuse prior work (code and pretrained models and make it easier to create new conversational AI models. NeMo is an open-source project, and we welcome contributions from the research community.

The 1.0 update brings significant architectural, code quality, and documentation improvements as well as a plethora of new state-of-the-art neural networks and pretrained checkpoints in several languages. The best way to start with NeMo is by installing it in your regular PyTorch environment:

pip install nemo_toolkit[all]

NeMo collections

NeMo is a PyTorch ecosystem project that relies heavily on two other projects from the ecosystem: PyTorch Lightning for training and Hydra for configuration management. You can also use NeMo models and modules within any PyTorch code.

NeMo comes with three main collections: ASR, NLP, and TTS. They are collections of models and modules that are ready to be reused in your conversational AI experiments. Most importantly, for most of the models, we provide weights pretrained on various datasets using tens of thousands of GPU hours.

Speech recognition

The NeMo ASR collection is the most extensive collection with a lot to offer for researchers of all levels, from beginners to advanced. If you are new to deep learning for speech recognition, we recommend that you get started with an interactive notebook for both ASR and NeMo overview. If you are an experienced researcher looking to create your own model, you’ll find various ready-to-use building blocks:

  • Data layers
  • Encoders
  • Augmentation modules
  • Text normalization and denormalization
  • More advanced decoders, such as RNN-T

The NeMo ASR collection provides you with various types of ASR networks: Jasper, QuartzNet, CitriNet, and Conformer. With the NeMo 1.0 update, the CitriNet and Conformer models are the next flagship ASR models providing better accuracy on word-error-rate (WER) than Jasper and QuartzNet while maintaining similar or better efficiency.

CitriNet

CitriNet is an improvement upon QuartzNet that uses several ideas originally introduced in ContextNet. It uses subword encoding through word piece tokenization and Squeeze-and-Excitation mechanism to obtain highly accurate audio transcripts while using a nonautoregressive, CTC-based decoding scheme for efficient inference.

Block diagram of Citrinet architecture with CTC decoding
Figure 1. CitriNet architecture.

Conformer-CTC

Conformer-CTC is a CTC-based variant of the Conformer model that uses CTC loss and decoding instead of RNN-T loss, making it a nonautoregressive model. This model combines self-attention and convolution modules to achieve the best of both worlds. The self-attention modules can learn the global interaction while the convolutions efficiently capture the local correlations.

This model gives you an option to experiment with attention-based models. Due to the global context obtained by self-attention and squeeze-and-excitation mechanism, Conformer and CitriNet models have superior WER in offline scenarios.

Block diagram of Conformer-CTC that combines self-attention and convolution modules.
Figure 2. Conformer-CTC architecture.

You can use Citrinet and Conformer models with CTC as well as RNN-T decoders.

We spent tens of thousands of GPU hours training ASR models in various languages. In NeMo, we offer these checkpoints back to the community for free. As of this release, NeMo has ASR models in English, Spanish, Chinese, Catalonian, Italian, Russian, French, and Polish. Moreover, we partner with Mozilla to make more pretrained models available with the help of Mozilla Common Voice project.

Finally, NeMo’s ASR collection contains reusable building blocks and pretrained models for various other important speech-based tasks such as: voice activity detection, speaker recognition, diarization, and voice command detection.

Natural language processing

Natural language processing (NLP) is essential for providing a great conversational AI experience. The NeMo NLP collection provides a set of pretrained models for typical NLP tasks such as question answering, punctuation and capitalization, named entity recognition, and neural machine translation

Hugging Face transformers have fueled many recent advances in NLP by providing a huge set of pretrained models and an easy-to-use experience for developers and researchers. NeMo is compatible with transformers in that most of the pretrained Hugging Face NLP models can be imported into NeMo. You may provide pretrained BERT-like checkpoints from transformers for the encoders of common tasks. The language models of the common tasks are initialized in default with the pretrained model from Hugging Face transformers. 

NeMo is also integrated with models trained by NVIDIA Megatron, allowing you to incorporate Megatron-based encoders into your question answering and neural machine translation models. NeMo can be used to fine-tune model-parallel models based on Megatron.

Neural machine translation

In today’s globalized world, it has become important to communicate with people speaking different languages. A conversational AI system capable of converting source text from one language to another will be a powerful communication tool. NeMo 1.0 now supports neural machine translation (NMT) tasks with transformer-based models allowing you to quickly build an end-to-end language translation pipelines. This release includes pretrained NMT models for the following language pairs in both directions:

  • English Spanish
  • English Russian
  • English Mandarin
  • English German
  • English French

Because tokenization is an extremely important part of NLP and NeMo supports most widely used tokenizers, such as HF tokenizers, SentencePiece, and YouTokenToMe.

Speech synthesis

If humans can talk to computers, the computers should be able to talk back as well. Speech synthesis takes text as an input and generates humanized audio output. This is typically accomplished with two models: a spectrogram generator that generates spectrograms from text and a vocoder that generates audio from spectrogram. The NeMo TTS collection provides you with the following models:

  • Pretrained spectrogram generator models: Tacotron2, GlowTTS, Fastspeech, Fastpitch, and Talknet
  • Pretrained vocoder models: HiFiGan, MelGan, SqueezeWave, Uniglow, and WaveGlow
  • End-to-end models: FastPitchHiFiGAN and Fastspeech2 Hifigan

End-to-end conversational AI example

Here’s a simple example demonstrating how to use NeMo for prototyping a universal translator app. This app takes a Russian audio file and generates an English translation audio. You can play with it using the AudioTranslationSample.ipynb notebook.

# Start by importing NeMo and all three collections
 import nemo
 import nemo.collections.asr as nemo_asr
 import nemo.collections.nlp as nemo_nlp
 import nemo.collections.tts as nemo_tts
 # Next, automatically download pretrained models from the NGC cloud
 quartznet = nemo_asr.models.EncDecCTCModel.from_pretrained(model_name="stt_ru_quartznet15x5")
 # Neural Machine Translation model
 nmt_model = nemo_nlp.models.MTEncDecModel.from_pretrained(model_name='nmt_ru_en_transformer6x6')
 # Spectrogram generator that takes text as an input and produces spectrogram
 spectrogram_generator = nemo_tts.models.Tacotron2Model.from_pretrained(model_name="tts_en_tacotron2")
 # Vocoder model that takes spectrogram and produces actual audio
 vocoder = nemo_tts.models.WaveGlowModel.from_pretrained(model_name="tts_waveglow_88m")
 # First step is to transcribe, or recognize, what was said in the audio
 russian_text = quartznet.transcribe([Audio_sample])
 # Then, translate it to English text
 english_text = nmt_model.translate(russian_text)
 # Finally, convert it into English audio
 # A helper function that combines Tacotron2 and WaveGlow to go directly from
 # text to audio
 def text_to_audio(text):
   parsed = spectrogram_generator.parse(text)
   spectrogram = spectrogram_generator.generate_spectrogram(tokens=parsed)
   audio = vocoder.convert_spectrogram_to_audio(spec=spectrogram)
   return audio.to('cpu').numpy()
 audio = text_to_audio(english_text[0]) 

The best part of this example is that you can fine-tune all the models used here on your datasets. In-domain fine-tuning is a great way to improve the performance of your models on specific applications. The NeMo GitHub repo provides plenty of fine-tuning examples.

NeMo models have a common look and feel, regardless of domain. They are configured, trained, and used in a similar fashion.

Scaling with NeMo

An ability to run experiments and quickly test new ideas is key to successful research. With NeMo, you can speed up training by using the latest NVIDIA Tensor Cores and model parallel training features across many nodes and hundreds of GPUs. Much of this functionality is provided with the help of the PyTorch Lightning trainer, which has an intuitive and easy-to-use API.

For speech recognition, language modeling, and machine translation, we provide high-performance web dataset-based data loaders. These data loaders can handle scaling to tens of thousands of hours of speech data to deliver high performance in massively distributed settings with thousands of GPUs.

Text processing and dataset creation with NeMo

Proper preparation of training data and pre and post-processing are hugely important and often overlooked steps in all machine learning pipelines. NeMo 1.0 includes new features for dataset creation and speech data explorer.

NeMo 1.0 includes important text processing features such as text normalization and inverse text normalization. Text normalization converts text from written form into its verbalized form. It is used as a preprocessing step before training TTS models. It could also be used for preprocessing ASR training transcripts. Inverse text normalization (ITN) is a reverse operation and is often a part of the ASR post-processing pipeline. It is the task of converting the raw spoken output of the ASR model into its written form to improve text readability.

For example, the normalized version of “It weighs 10 kg.” would be “It weighs 10 kilograms”.

Conclusion

NeMo 1.0 release substantially improves overall quality and documentation. It adds support for new tasks such as neural machine translation and many new models pretrained in different languages. As a mature tool for ASR and TTS, it also adds new features for text normalization and denormalization, dataset creation based on CTC-segmentation and speech data explorer. These updates benefit researchers in academia and industry by making it easier for you to develop and train new conversational AI models.

Many NeMo models can be exported to NVIDIA Jarvis for production deployment and high-performance inference. NVIDIA Jarvis is an application framework for building multimodal conversational AI services that delivers real-time performance on GPUs.

We welcome external contributions! On the NVIDIA NeMo GitHub page, you can try out the examples, participate in community discussions, and take your models from research to production using NeMo and NVIDIA Jarvis.

Categories
Misc

Make the Metaverse with Us: Go Behind the Scenes of NVIDIA Omniverse, Now Streaming on Twitch

Hear directly from the engineers, designers, and creators that are at the edge of developing NVIDIA Omniverse, the virtual collaboration and physically accurate simulation platform, now live on Twitch.

Hear directly from the engineers, designers, and creators that are at the edge of developing NVIDIA Omniverse, the virtual collaboration and physically accurate simulation platform, now live on Twitch.

Our newly announced streaming calendar features:

What is Omniverse?

NVIDIA Omniverse is an open platform built for creators, designers, and engineers like you to enhance collaborative visual workflows  and create real-time physically accurate simulation. This means you and your team will be able to connect major design tools, assets and projects for collaborative iteration in a shared virtual space.

All the Sneak Peaks – Exclusively on Twitch

Whether it’s our latest app releases or just a behind the scenes look at our engineer’s work-from-home set-up, our Twitch streams will provide you special access to learn what Omniverse, and the people behind the platform, are all about.

Become a True Omnivore

With Twitch’s live-chat feature, you’ll be able to interact with our team in real-time. All information in our streams will be coming from Omniverse-insiders.Your input matters to us, and during our streams you can play a key role in helping us build the metaverse.

GPU lip reading
Take 3D Game Storytelling to the Next Level with Omniverse Machinima.

Omniverse Machinima enables creators to animate characters using just an audio source, render high-fidelity, realistic scenes with physically accurate materials, and simulate human motion through a video feed powered by AI-based pose estimation technology, all in real-time. 

Omniverse enables universal interoperability across different applications and 3D ecosystem vendors, and provides efficient real-time scene updates, based on open-standards and protocols. The Omniverse platform is designed to act as a hub, enabling new capabilities to be exposed as microservices to any connected clients and applications. 

Whether you’re looking for ways to integrate Omniverse apps into your workflow or just want to get a high-level overview of what the platform is, make sure to follow our Twitch for direct access to Omniverse-insiders. 

Follow us on social: Instagram, Twitter, LinkedIn and Discord for the latest announcements and next events. See you in the Omniverse!

Categories
Misc

NVIDIA Research at CVPR 2021

The 15 accepted papers and posters from NVIDIA range from simulating dynamic driving environments, to powering neural architecture search for medical imaging.

Researchers, developers, and engineers worldwide are gathering virtually this year for the annual Conference on Computer Vision and Pattern Recognition (CVPR) from June 19th to June 25th. Throughout the week, NVIDIA Research will present their recent computer vision-related projects via presentations and interactive Q&As. 

The nearly 30 accepted papers from NVIDIA range from simulating dynamic driving environments, to powering neural architecture search for medical imaging.

Here are a few featured papers:

DriveGAN: Towards a Controllable High-Quality Neural Simulation
Authors: Seung Wook Kim (University of Toronto, NVIDIA)*; Jonah Philion (University of Toronto, NVIDIA); Antonio Torralba (MIT); Sanja Fidler (University of Toronto, NVIDIA)

DriveGAN is a fully differentiable simulator, it further allows for re-simulation of a given video sequence, offering an agent to drive through a recorded scene again, possibly taking different actions. 

The talk will be live on Tuesday, June 22, 2021 at 10:00pm EST

DiNTS: Differentiable Neural Network Topology Search for 3D Medical Image Segmentation
Authors: Yufan He (Johns Hopkins University)*; Dong Yang (NVIDIA); Holger R Roth (NVIDIA); Can Zhao (NVIDIA); Daguang Xu (NVIDIA)

From the abstract: In this work, we focus on three important aspects of NAS in 3D medical image segmentation: flexible multi-path network topology, high search efficiency, and budgeted GPU memory usage. Our method achieves the state-of-the-art performance and the top ranking on the MSD challenge leaderboard.

The talk will be live on Tuesday, June 22, 2021 at 10:00 pm EST

To view the complete list of NVIDIA Research accepted papers, workshop and tutorials, demos, and to explore job opportunities at NVIDIA, visit the NVIDIA at CVPR 2021 website.

Categories
Offsites

Data Cascades in Machine Learning

Data is a foundational aspect of machine learning (ML) that can impact performance, fairness, robustness, and scalability of ML systems. Paradoxically, while building ML models is often highly prioritized, the work related to data itself is often the least prioritized aspect. This data work can require multiple roles (such as data collectors, annotators, and ML developers) and often involves multiple teams (such as database, legal, or licensing teams) to power a data infrastructure, which adds complexity to any data-related project. As such, the field of human-computer interaction (HCI), which is focused on making technology useful and usable for people, can help both to identify potential issues and to assess the impact on models when data-related work is not prioritized.

In “‘Everyone wants to do the model work, not the data work’: Data Cascades in High-Stakes AI”, published at the 2021 ACM CHI Conference, we study and validate downstream effects from data issues that result in technical debt over time (defined as “data cascades”). Specifically, we illustrate the phenomenon of data cascades with the data practices and challenges of ML practitioners across the globe working in important ML domains, such as cancer detection, landslide detection, loan allocation and more — domains where ML systems have enabled progress, but also where there is opportunity to improve by addressing data cascades. This work is the first that we know of to formalize, measure, and discuss data cascades in ML as applied to real-world projects. We further discuss the opportunity presented by a collective re-imagining of ML data as a high priority, including rewarding ML data work and workers, recognizing the scientific empiricism in ML data research, improving the visibility of data pipelines, and improving data equity around the world.

Origins of Data Cascades
We observe that data cascades often originate early in the lifecycle of an ML system, at the stage of data definition and collection. Cascades also tend to be complex and opaque in diagnosis and manifestation, so there are often no clear indicators, tools, or metrics to detect and measure their effects. Because of this, small data-related obstacles can grow into larger and more complex challenges that affect how a model is developed and deployed. Challenges from data cascades include the need to perform costly system-level changes much later in the development process, or the decrease in users’ trust due to model mis-predictions that result from data issues. Nevertheless and encouragingly, we also observe that such data cascades can be avoided through early interventions in ML development.

Different color arrows indicate different types of data cascades, which typically originate upstream, compound over the ML development process, and manifest downstream.

Examples of Data Cascades
One of the most common causes of data cascades is when models that are trained on noise-free datasets are deployed in the often-noisy real world. For example, a common type of data cascade originates from model drifts, which occur when target and independent variables deviate, resulting in less accurate models. Drifts are more common when models closely interact with new digital environments — including high-stakes domains, such as air quality sensing, ocean sensing, and ultrasound scanning — because there are no pre-existing and/or curated datasets. Such drifts can lead to more factors that further decrease a model’s performance (e.g., related to hardware, environmental, and human knowledge). For example, to ensure good model performance, data is often collected in controlled, in-house environments. But in the live systems of new digital environments with resource constraints, it is more common for data to be collected with physical artefacts such as fingerprints, shadows, dust, improper lighting, and pen markings, which can add noise that affects model performance. In other cases, environmental factors such as rain and wind can unexpectedly move image sensors in deployment, which also trigger cascades. As one of the model developers we interviewed reported, even a small drop of oil or water can affect data that could be used to train a cancer prediction model, therefore affecting the model’s performance. Because drifts are often caused by the noise in real-world environments, they also take the longest — up to 2-3 years — to manifest, almost always in production.

Another common type of data cascade can occur when ML practitioners are tasked with managing data in domains in which they have limited expertise. For instance, certain kinds of information, such as identifying poaching locations or data collected during underwater exploration, rely on expertise in the biological sciences, social sciences, and community context. However, some developers in our study described having to take a range of data-related actions that surpassed their domain expertise — e.g., discarding data, correcting values, merging data, or restarting data collection — leading to data cascades that limited model performance. The practice of relying on technical expertise more than domain expertise (e.g., by engaging with domain experts) is what appeared to set off these cascades.

Two other cascades observed in this paper resulted from conflicting incentives and organizational practices between data collectors, ML developers, and other partners — for example, one cascade was caused by poor dataset documentation. While work related to data requires careful coordination across multiple teams, this is especially challenging when stakeholders are not aligned on priorities or workflows.

How to Address Data Cascades
Addressing data cascades requires a multi-part, systemic approach in ML research and practice:

  1. Develop and communicate the concept of goodness of the data that an ML system starts with, similar to how we think about goodness of fit with models. This includes developing standardized metrics and frequently using those metrics to measure data aspects like phenomenological fidelity (how accurately and comprehensively does the data represent the phenomena) and validity (how well the data explains things related to the phenomena captured by the data), similar to how we have developed good metrics to measure model performance, like F1-scores.
  2. Innovate on incentives to recognize work on data, such as welcoming empiricism on data in conference tracks, rewarding dataset maintenance, or rewarding employees for their work on data (collection, labelling, cleaning, or maintenance) in organizations.
  3. Data work often requires coordination across multiple roles and multiple teams, but this is quite limited currently (partly, but not wholly, because of the previously stated factors). Our research points to the value of fostering greater collaboration, transparency, and fairer distribution of benefits between data collectors, domain experts, and ML developers, especially with ML systems that rely on collecting or labelling niche datasets.
  4. Finally, our research across multiple countries indicates that data scarcity is pronounced in lower-income countries, where ML developers face the additional problem of defining and hand-curating new datasets, which makes it difficult to even start developing ML systems. It is important to enable open dataset banks, create data policies, and foster ML literacy of policy makers and civil society to address the current data inequalities globally.

Conclusion
In this work we both provide empirical evidence and formalize the concept of data cascades in ML systems. We hope to create an awareness of the potential value that could come from incentivising data excellence. We also hope to introduce an under-explored but significant new research agenda for HCI. Our research on data cascades has led to evidence-backed, state-of-the-art guidelines for data collection and evaluation in the revised PAIR Guidebook, aimed at ML developers and designers.

Acknowledgements
This paper was written in collaboration with Shivani Kapania, Hannah Highfill, Diana Akrong, Praveen Paritosh and Lora Aroyo. We thank our study participants, and Sures Kumar Thoddu Srinivasan, Jose M. Faleiro, Kristen Olson, Biswajeet Malik, Siddhant Agarwal, Manish Gupta, Aneidi Udo-Obong, Divy Thakkar, Di Dang, and Solomon Awosupin.

Categories
Misc

NVIDIA Clara AGX SDK 3.0 Goes Public and Includes New Application Container

The Clara AGX SDK runs on the NVIDIA Jetson and Clara AGX platform and provides developers with capabilities to build end-to-end streaming workflows for medical imaging.

NVIDIA Clara AGX SDK 3.0 is available today! The Clara AGX SDK runs on the NVIDIA Jetson and Clara AGX platform and provides developers with capabilities to build end-to-end streaming workflows for medical imaging. The focus of this release is to provide added support for NGC containers, including TensorFlow and PyTorch frameworks, a new ultrasound application, and updated Transfer Learning Toolkit scripts.  

NVIDIA Clara AGX SDK

There is now support for the leading deep learning framework containers, including TensorFlow 1, TensorFlow 2, and PyTorch, as well as the Triton Inference Server. These containers can help you quickly get started using the Clara AGX Development Kit, NVIDIA’s GPU super-charged development platform for AI medical devices and edge-based inferencing. We’ve also released three new application containers along with the SDK, available on NGC. These application containers include: 

  • Metagenomics application 
  • US4US Ultrasound application and sample code 
  • Dermatology Melanoma detection  

Clara AGX SDK has also been updated to the latest Transfer Learning Toolkit (TLT) 3.0 release. Developers can now use TLT 3.0 out-of-the-box and includes compatibility with DeepStream SDK for real-time, low latency, high-resolution image AI deployments.  

Download Clara AGX SDK 3.0 through the Clara AGX Developer Site. An NVIDIA Developer Program account is needed to access the SDK. You can also find all of our containers through NGC

Categories
Misc

XOR Problem with MLP

Hi guys, I’m trying to reproduce this mlp in tf (constraining it to have only one hidden layer with 2 units). However, like in playground tf, many times do not converge to global maxima. I think is due to weight initialization, tried xavier and he initializations but no success. The following piece of code is the model in tf keras.

model = tf.keras.models.Sequential([ Dense(units=2, activation='sigmoid', input_shape=(2,)), Dense(units=1, activation='sigmoid') 

])

Any help, would be appreciated. Thanks.

https://preview.redd.it/otv9rmbrtr251.png?width=1375&format=png&auto=webp&s=6757653b702f45a728dd032a89368fedbd6968e2

submitted by /u/Idea_Cultural
[visit reddit] [comments]

Categories
Misc

How can tensorflow utilize multiple cores by default if python is limited to one core by default?

All the threads I’ve read online regarding utilizing multiple cores in python requires multiprocessing. By default python can only bye running on one core (and one on thread, because of the GIL).

On the other hand, I often read that tensorflow by default uses multiple cores. How can this work if python itself is limited to one core?

submitted by /u/hattrickostrich
[visit reddit] [comments]

Categories
Misc

Fast Track AI Model Adaptation with TAO

NVIDIA Train, Adapt, and Optimize (TAO) is an AI model adaptation platform that simplifies and accelerates the creation of enterprise AI applications.

Building a state-of-the-art deep learning model is a complex and time-consuming process. To achieve this, large datasets collected for the model must be of high quality. Once the data is collected, it must be prepared and then trained, and optimized over several iterations. This is not always an option for many enterprises looking to bring their AI applications to market faster while reducing operational costs.

NVIDIA TAO is being developed to address these challenges. NVIDIA Train, Adapt, and Optimize (TAO) is an AI model adaptation platform that simplifies and accelerates the creation of enterprise AI applications. By fine-tuning state-of-the-art pre-trained models created by NVIDIA experts with custom data through a UI-based, guided workflow, you can produce highly accurate computer vision, speech, and language understanding models in hours rather than months, eliminating the need for large training runs and deep AI expertise.

As a managed and guided workflow, TAO lowers the barrier to building AI by unifying key existing NVIDIA technologies, such as pre-trained models from the NGC catalog, Transfer Learning Toolkit (TLT), Federated Learning with NVIDIA Clara, and TensorRT.

Registration for the Early Access Program is now open. Later this year we will begin accepting applicants into the program which will provide you with an exclusive opportunity to collaborate with the NVIDIA product team to help shape TAO.

Key Highlights of the Early Access Program:

  • Access to state-of-the-art accurate pre-trained models that can be customized for your use case
  • Access to compute infrastructure
  • Hands-on support to help you navigate through the entire process

Apply to the TAO Early Access Program here.

Categories
Misc

Using Physics-Informed Deep Learning for Transport in Porous Media

Simulations are pervasive in every domain of science and engineering, but they are often constrained by large computational times, limited compute resources, tedious manual setup efforts, and the need for technical expertise. NVIDIA SimNet is a simulation toolkit that addresses these challenges with a combination of AI and physics.  A success story of SimNet’s application … Continued

Simulations are pervasive in every domain of science and engineering, but they are often constrained by large computational times, limited compute resources, tedious manual setup efforts, and the need for technical expertise. NVIDIA SimNet is a simulation toolkit that addresses these challenges with a combination of AI and physics. 

A success story of SimNet’s application today is in modeling the flow and transport in porous media. This effort was led by Cedric Frances, a PhD student at Stanford University.

Use case study

Cedric is researching the applicability and limitations of mesh-free reservoir simulations using physics-informed neural networks (PINNs). He’s keenly interested in the flow and transport problem in porous media (conservation of mass and Darcy flow). Cedric’s application is a Python-based reservoir simulator, which computes the pressure and concentrations of various fluids in a porous media and enables predictions that typically affect large, industrial energy projects. This includes the production of hydrocarbons, storage of carbon dioxide, water disposal, air storage, waste management, and so on.

Researchers previously tried to use the PINNs approach to capture the solution of a hyperbolic problem with nonconvex flux term (Riemann problem) in a forward setting with no data other than initial and boundary conditions. Unfortunately, these attempts were unsuccessful.

Before trying out SimNet, Cedric developed his own implementations of PINNs using Python and deep learning frameworks such as TensorFlow and Keras. He used various architectures of networks, such as residual, GAN, periodic activation, CNN, PDE-Net, and so on. However, it was difficult to implement all of them to find which one worked best or worked at all. The emergence of open-source code on GitHub made it easy to test these implementations out. The high overhead involved in every new implementation, such as environment setup, hardware configuration, modification of code to test his own problem, and so on, was not efficient.

Cedric wanted to have a good, unified framework maintained by a team of professional software developers to solve problems that allowed him to focus on the physics of the problem and extensively test the methods that have been recently published. His search for such a framework ended when he stumbled upon SimNet.

Cedric downloaded SimNet and started using fully connected networks with tanh activation functions and spatial weighing of the loss function. He discovered that SimNet’s general-purpose framework with multiple architectures and well-documented examples served as a good starting point. Its ability to emulate solutions with sharp shocks, introduce new dynamic constraints such as entropy and velocity saved him weeks of development. More importantly, it provided a quick turnaround on testing methods to determine their usefulness.

The problem presented here is that of incompressible, immiscible displacement of two phases in a porous medium. This is also referred to as transport problem and has been delineated in various forms over the years. It has been applied to the displacement of oil by water for waterflood problems in reservoir for over half a century. More recently, it’s been applied to the displacement of brine by CO2 in carbon sequestration applications. For more information, see Mechanism of Fluid Displacement in Sands and Theory of Gas Injection Processes.

Assume that a wetting phase (w) is displacing a nonwetting phase (n). Wettability is the preference of a fluid to be in contact with a solid surrounded by another fluid; for example, water is wetting on most surfaces compared to air. Conservation of mass applies to both phases. For the wetting phase:

phi frac{partial S_w}{partial t} + nabla q_w = 0 (1)

In this formula, phi is the porosity,  S_w is the saturation (or concentration) of that wetting phase, and is the saturation of the nonwetting. The flow rate of the wetting phase  q_w can be written as follows:

q_w = - frac{kk_{rw}(S_w)}{mu_w}nabla p (2)

In this formula,  k is the absolute permeability that quantifies the propensity of a material to allow liquid or gas to flow through it.  k_{rw}(S_w) is the wetting phase relative permeability that is function of the saturation and characterizes the effective permeability of a given phase in the presence or absence of it. A phase preferentially flows through a path where it is already present. Think of a drop of water dripping down from a window and following existing trails.

You can formulate the phase flux of the wetting phase  q_w as a function of the total flux q = q_w+q_n using a simple homogenization rule:                                               

q = -kleft[frac{k_{rw}(S_w)}{mu_w} + frac{k_{rn}(S_n)}{mu_n}right]nabla p (3)

You can rewrite this equation as a function of the total flux. This gives rise to the fractional flow:

f_w = frac{q_w}{q} = frac{1}{1+frac{k_{rn}mu_w}{k_{rw}mu_n}} (4)

The conservation equation can now be written:

frac{partial S_w}{partial t} + frac{q}{phi}nabla f_w = 0 (5)

For a one-dimensional case where you assume that the total flux is equal to one pore volume injected per time step ( q = phi), you can obtain the equation:

frac{partial S_w}{partial t} + frac{partial f(S_w)}{partial x} = 0 (6)

In this formula, the fractional flow is a nonlinear equation defined as follows:

f_w(S) = frac{(S - S_{wc})^2}{(S - S_{wc})^2 + (1 - S - S_{nr})^2/M} (7)

In this formula, S_{wc} and S_{nr} are the residual (irreducible) saturations for the wetting and nonwettings resulting from trapping mechanisms and M is the endpoint mobility ratio defined as the ratio of endpoint-relative permeability and viscosity of both phases. We used the Corey and Brooks relative permeability relationship. For more information, see Hydraulic Properties of Porous Media.

The partial differential equation solved here is a hyperbolic of order 1 and the fractional flow term is nonconvex. It belongs to the class of Riemann conservation problem that is typically solved using finite volume methods. For more information, see Hyperbolic systems of conservation laws and the mathematical theory of shock waves.

With uniform Dirichlet boundary conditions:

S(x=0,t) = S_{inj}        (8)

S(x,t=0) = S_{wc}        (9)

You can apply the method of characteristics (MOC) to build an analytical solution to this equation. For the MOC or any finite volume method to be conservative, you must modify the fractional flow term as shown in Figure 1.

Figure 1. Fractional flow curve (blue) along with Welge construction (dotted black) for case with Swc=Sor= 0. Source: Physics Informed Deep Learning for Flow and Transport in Porous Media

Until now, no other known method solved such a problem using a sampling method, so this remained an open question. A previous attempt by Fuks and Tchelepi concluded that physics-informed approaches were not suitable for the problem described (Figure 2).

Figure 2. Results of saturation inference computed using the PINN approach (dashed red) conditioned on the weak form of the hyperbolic problem. The reference solution using MOC is plotted in blue. Source: Physics Informed Deep Learning for Flow and Transport in Porous Media
Figure 3. Results of saturation inference using PINN (dashed red) vs MOC (blue) with velocity constraint and entropy condition. The convex hull of the fractional flow curve is used to model the displacement. Source: Physics Informed Deep Learning for Flow and Transport in Porous Media

Cedric’s research on this topic has now been published: Physics Informed Deep Learning for Flow and Transport in Porous Media.

Important theoretical milestones are now being reached on simple yet challenging 1D examples. Cedric plans on expanding his study to larger dimensions (2D and 3D), where the scalability of the code and the easy deployment on larger arrays will be put to the test. He expects to encounter similar issues and is looking forward to the gain provided by SimNet going from 2D to 3D, for example.

Cedric elaborated on his experience with SimNet. “SimNet’s clear APIs, clean and easily navigable code, environment and hardware configurations well handled with Docker containers, scalability, ease of deployment and the competent support team made it easy to adopt and has provided some very promising results. This has been great so far and we look forward to using SimNet on problems with much larger dimensions.”

To view the GTC’21 session, see Physics-Informed Neural Network for Flow and Transport in Porous Media. For more information about features and to download the toolkit, see NVIDIA SimNet.