Categories
Misc

AWS Brings NVIDIA A10G Tensor Core GPUs to the Cloud with New EC2 G5 Instances

A round conference room with a sphere in the middle.Read about the new EC2 G5 instance that powers remote graphics, visual computing, AI/ML training, and inference workloads on AWS cloud.A round conference room with a sphere in the middle.

Today, AWS announced the general availability of the new Amazon EC2 G5 instances, powered by NVIDIA A10G Tensor Core GPUs. These instances are designed for the most demanding graphics-intensive applications, as well as machine learning inference and training simple to moderately complex machine learning models on the AWS cloud.

The new EC2 G5 instances feature up to eight NVIDIA A10G Tensor Core GPUs that are optimized for advanced visual computing workloads. With support for NVIDIA RTX technology and more RT (ray tracing) cores than any other NVIDIA GPU instance on AWS, it offers up to 3X better graphics performance. Based on NVIDIA Ampere Architecture, G5 instances offer up to 3X higher performance for machine learning inference and 3.3X higher performance for machine learning training, compared to the previous generation Amazon EC2 G4dn instances.

Customers can use the G5 instances to accelerate a broad range of graphics applications like interactive video rendering, video editing, computer-aided design, photorealistic simulations, 3D visualization, and gaming. G5 instances also deliver the best user experience for real-time AI inference performance at scale for use-cases like content and product recommendations, voice assistants, chatbots, and visual search.

Getting the most out of EC2 G5 instances using NVIDIA optimized software

To unlock the breakthrough graphics performance on the new G5 instances, creative and technical professionals can use the NVIDIA RTX Virtual Workstation (vWS) software, available from the AWS Marketplace. Only available from NVIDIA, these NVIDIA RTX vWS advancements include hundreds of certified professional ISV applications, support for all of the leading rendering apps, and optimization with all major gaming content. 

NVIDIA RTX technology delivers exceptional features like ray tracing and AI-denoising.  Creative professionals can achieve photorealistic quality with accurate shadows, reflections, and refractions—creating amazing content faster than ever before. 

NVIDIA RTX vWS also supports Deep Learning Super Sampling (DLSS). This gives designers, engineers, and artists the power of AI for producing the highest visual quality, from anywhere. They can also take advantage of technologies like NVIDIA Iray and NVIDIA OptiX for superior rendering capabilities.

Developers on AWS can use state-of-the-art pretrained AI models, GPU-optimized deep learning frameworks, SDKs, and end-to-end application frameworks from the NGC Catalog on AWS Marketplace soon. In particular, developers can take advantage of NVIDIA TensorRT and NVIDIA Triton Inference Server to optimize inference performance and serve ML models at scale using G5 instances. 

Developers have multiple options to take advantage of NVIDIA-optimized software on AWS. Whether you provision and manage the G5 instances yourself or leverage them in AWS managed services like Amazon Elastic Kubernetes service (EKS) or Amazon Elastic Container Service (ECS).

Learn more about the EC2 G5 instances and get started. >>

Categories
Misc

New Online Course Offers Hands-on Machine Learning Using AWS and NVIDIA

AWS, NVIDIA logosAWS and NVIDIA have collaborated to develop an online course that introduces Amazon SageMaker with EC2 Instances powered by NVIDIA GPUs.AWS, NVIDIA logos

AWS and NVIDIA have collaborated to develop an online course that guides you through a simple-to-follow and practical introduction to Amazon SageMaker with EC2 Instances powered by NVIDIA GPUs. This course is grounded in the practical application of services and gives you the opportunity to learn hands-on from experts in machine learning development. Through a simple and straightforward approach, once completed, you will have the confidence and competency to immediately begin working on your ML project.

Machine learning can be complex, tedious, and time-consuming. AWS and NVIDIA provide the fastest, most effective, and easy-to-use ML tools to get you started on your ML project. Amazon SageMaker helps data scientists and developers prepare, build, train, and deploy high-quality ML models quickly by bringing together a broad set of capabilities purpose-built for ML. Amazon EC2 instances powered by NVIDIA GPUs along with NVIDIA software offer high-performance, GPU-optimized instances in the cloud for efficient model training and cost-effective model inference hosting.

In this course, you will first be given a high-level overview of modern machine learning. Then, we will dive right in and get you up and running with a GPU-powered SageMaker instance. You will learn how to prepare a dataset for training a model, how to build a model, how to execute the training of a model, and how to deploy and optimize a model. You will learn hands-on how to apply this workflow for computer vision (CV) and natural language processing (NLP) use cases.

After completing this course, you will be able to build, train, deploy, and optimize ML workflows with GPU acceleration in Amazon SageMaker and understand the key SageMaker services applicable to tabular, computer vision, and language ML tasks. You will feel empowered and have the confidence and competency to solve complex machine learning problems in a more efficient manner.  By using SageMaker, you will simplify workflows so you can build and deploy ML models quickly, freeing you up to focus on other problems to solve. 

Course Overview

This course is designed for machine learning practitioners, including data scientists and developers, who have a working knowledge of machine learning workflows. In this course, you will gain hands-on experience with Amazon SageMaker and Amazon EC2 instances powered by NVIDIA GPUs. There are four modules in the course:

Module 1 – Introduction to Amazon SageMaker and NVIDIA GPUs

In this module, you will learn about the purpose-built tools available within Amazon SageMaker for modern machine learning. This includes a tour of the Amazon SageMaker Studio IDE that can be used to prepare, build, train and tune, and deploy and manage your own ML models. Then you will learn how to use Amazon SageMaker classic notebooks and Amazon SageMaker Studio notebooks to develop natural language processing (NLP), computer vision (CV), and other ML models using RAPIDS. You will also dive deep into NVIDIA GPUs, the NGC Catalog, and instances available on AWS for ML.

Module 2 – GPU Accelerated Machine Learning Workflows with RAPIDS and Amazon SageMaker

In this module, you will apply your knowledge of NVIDIA GPUs and Amazon SageMaker. You will gain a background in GPU accelerated machine learning and perform the steps required to set up Amazon SageMaker. You will then learn about data acquisition and data transformation, move on to model design and training, and finish up by evaluating hyperparameter optimization, AutoML, and GPU accelerated inferencing.

Module 3 – Computer Vision

In this module, you will learn about the application of deep learning for computer vision (CV). As humans, half of our brains are devoted to visual processing, making it critical to how we perceive the world. Endowing machines with sight has been a challenging endeavor, but advancements in compute, algorithms, and data quality have made computer vision more accessible than ever before. From mobile cameras to industrial mechanic lenses, biological labs to hospital imaging, and self-driving cars to security cameras, data in pixel format is one of the most valuable types of data for consumers and companies. In this module, you will explore common CV applications, and you will learn how to build an end-to-end object detection model on Amazon SageMaker using NVIDIA GPUs.

Module 4 – Natural Language Processing

In this module, you will learn about applying deep learning technologies to the problem of language understanding. What does it mean to understand languages? What is language modeling? What is the BERT language model, and why are such language models used in many popular services like search, office productivity software, and voice agents? Are NVIDIA GPUs a fast and cost-efficient platform to train and deploy NLP Models? In this module, you will find answers to all those questions and more. Whether you are an experienced ML engineer considering implementation or a developer wanting to learn to deploy a language understanding model like BERT quickly, this module is for you.

Conclusion

AWS and NVIDIA provide fast, effective, easy-to-use ML tools to get you started on working on your ML project. Learn more about the course to guide you through your ML journey!

Categories
Misc

Looking for help on understanding the notation of filenames like "Mobilenet_V1_1.0_224_quant.tflite"

I’m looking at running some models from https://www.tensorflow.org/lite/guide/hosted_models

The model filename is something like `Mobilenet_V1_1.0_224_quant.tflite`

I understand that 224 is the input size but I’m not sure what the 1.0 represents. It would be useful if someone can tell me what the 1.0 means. Feel free to link some docs that would give me insight if you find that easier 🙂

Thanks in advance, really appreciate it.

submitted by /u/sonjpaul
[visit reddit] [comments]

Categories
Misc

were can i find inputs and output of a model

hi guys lately I find a TensorFlow Lite model and I wanna use it on my android app (the link of the model below) and I didn’t find input and the outputs

and if I wondering if there a way to see the inputs and outputs type

https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet_v1.md

submitted by /u/Left_Complaint_6668
[visit reddit] [comments]

Categories
Offsites

Train in R, run on Android: Image segmentation with torch

We train a model for image segmentation in R, using torch together with luz, its high-level interface. We then JIT-trace the model on example input, so as to obtain an optimized representation that can run with no R installed. Finally, we show the model being run on Android.

Categories
Offsites

Simple Portfolio Optimization That Works!

Categories
Offsites

Newton’s Fractal (which Newton knew nothing about)

Categories
Offsites

How a Mandelbrot set arises from Newton’s work

Categories
Offsites

A few of the best math explainers from this summer

Categories
Offsites

Grammar Correction as You Type, on Pixel 6

Despite the success and widespread adoption of smartphones, using them to compose longer pieces of text is still quite cumbersome. As one writes, grammatical errors can often creep into the text (especially undesirable in formal situations), and correcting these errors can be time consuming on a small display with limited controls.

To address some of these challenges, we are launching a grammar correction feature that is directly built into Gboard on Pixel 6 that works entirely on-device to preserve privacy, detecting and suggesting corrections for grammatical errors while the user is typing. Building such functionality required addressing a few key obstacles: memory size limitations, latency requirements, and handling partial sentences. Currently, the feature is capable of correcting English sentences (we plan to expand to more languages in the near future) and available on almost any app with Gboard1.

Gboard suggests how to correct an ungrammatical sentence as the user types.

Model Architecture
We trained a sequence-to-sequence neural network to take an input sentence (or a sentence prefix) and output the grammatically correct version — if the original text is already grammatically correct, the output of the model is identical to its input, indicating that no corrections are needed. The model uses a hybrid architecture that combines a Transformer encoder with an LSTM decoder, a combination that provides a good balance of quality and latency.

Overview of the grammatical error correction (GEC) model architecture.

Mobile devices are constrained by limited memory and computational power, which make it more difficult to build a high quality grammar checking system. There are a few techniques we use to build a small, efficient, and capable model.

  • Shared embedding: Because the input and output of the model are structurally similar (e.g., both are text in the same language), we share some of the model weights between the Transformer encoder and the LSTM decoder, which reduces the model file size considerably without unduly affecting accuracy.
  • Factorized embedding: The model splits a sentence into a sequence of predefined tokens. To achieve good quality, we find that it is important to use a large vocabulary of predefined tokens, however, this substantially increases the model size. A factorized embedding separates the size of the hidden layers from the size of the vocabulary embedding. This enables us to have a model with a large vocabulary without significantly increasing the number of total weights.
  • Quantization: To reduce the model size further, we perform post-training quantization, which allows us to store each 32-bit floating point weight using only 8-bits. While this means that each weight is stored with lower fidelity, nevertheless, we find that the quality of the model is not materially affected.

By employing these techniques, the resulting model takes up only 20MB of storage and performs inference on 60 input characters under 22ms on the Google Pixel 6 CPU.

Training the Model
In order to train the model, we needed training data in the form of <original, corrected> text pairs.

One possible approach to generating a small on-device model would be to use the same training data as a large cloud-based grammar model. While this data produces a reasonably high quality on-device model, we found that using a technique called hard distillation to generate training data that is better-matched to the on-device domain yields even better quality results.

Hard distillation works as follows: We first collected hundreds of millions of English sentences from across the public web. We then used the large cloud-based grammar model to generate grammar corrections for those sentences. This training dataset of <original, corrected> sentence pairs is then used to train a smaller on-device model that can correct full sentences. We found that the on-device model built from this training dataset produces significantly higher quality suggestions than a similar-sized on-device model built on the original data used to train the cloud-based model.

Before training the model from this data, however, there is another issue to address. To enable the model to correct grammar as the user types (an important capability of mobile devices) it needs to be able to handle sentence prefixes. While this enables grammar correction when the user has only typed part of a sentence, this capability is particularly useful in messaging apps, where the user often omits the final period in a sentence and presses the send button as soon as they finish typing. If grammar correction is only triggered on complete sentences, it might miss many errors.

This raises the question of how to decide whether a given sentence prefix is grammatically correct. We used a heuristic to solve this — if a given sentence prefix can be completed to form a grammatically correct sentence, we then consider it grammatically correct. If not, it is assumed to be incorrect.

What the user has typed so far       Suggested grammar correction
She puts a lot
She puts a lot of
She puts a lot of effort
She puts a lot of effort yesterday   Replace “puts” with “put in”.
GEC on incomplete sentences. There is no correction for valid sentence prefixes.

We created a second dataset suitable for training a large cloud-based model, but this time focusing on sentence prefixes. We generated the data using the aforementioned heuristic by taking the <original, corrected> sentence pairs from the cloud-based model’s training dataset and randomly sampling aligned prefixes from them.

For example, given the <original, corrected> sentence pair:

Original sentence: She puts a lot of effort yesterday afternoon.
Corrected sentence: She put in a lot of effort yesterday afternoon.

We might sample the following prefix pairs:

Original prefix: She puts
Corrected prefix: She put in

Original prefix: She puts a lot of effort yesterday
Corrected prefix: She put in a lot of effort yesterday

We then autocompleted each original prefix to a full sentence using a neural language model (similar in spirit to that used by SmartCompose). If a full-sentence grammar model finds no errors in the full sentence, then that means there is at least one possible way to complete this original prefix without making any grammatical errors, so we consider the original prefix to be correct and output <original prefix, original prefix> as a training example. Otherwise, we output <original prefix, corrected prefix>. We used this training data to train a large cloud-based model that can correct sentence prefixes, then used that model for hard distillation, generating new <original, corrected> sentence prefix pairs that are better-matched to the on-device domain.

Finally, we constructed the final training data for the on-device model by combining these new sentence prefix pairs with the full sentence pairs. The on-device model trained on this combined data is then capable of correcting both full sentences as well as sentence prefixes.

Training data for the on-device model is generated from cloud-based models.

Grammar Correction On-Device
Gboard sends a request to the on-device grammar model whenever the user has typed more than three words, whether the sentence is completed or not. To provide a quality user experience, we underline the grammar mistakes and provide replacement suggestions when the user interacts with them. However, the model outputs only corrected sentences, so those need to be transformed into replacement suggestions. To do this, we align the original sentence and the corrected sentence by minimizing the Levenshtein distance (i.e., the number of edits that are needed to transform the original sentence to the corrected sentence).

Extracting edits by aligning the corrected sentence to the original sentence.

Finally, we transform the insertion edits and deletion edits to be replacement edits. In the above example, we transform the suggested insertion of “in” to be an edit that suggests replacing “puts” with “put in”. And we similarly suggest replacing “effort on” with “effort”.

Conclusion
We have built a small high-quality grammar correction model by designing a compact model architecture and leveraging a cloud-based grammar system during training via hard distillation. This compact model enables users to correct their text entirely on their own device without ever needing to send their keystrokes to a remote server.

Acknowledgements
We gratefully acknowledge the key contributions of the other team members, including Abhanshu Sharma, Akshay Kannan, Bharath Mankalale, Chenxi Ni, Felix Stahlberg, Florian Hartmann, Jacek Jurewicz, Jayakumar Hoskere, Jenny Chin, Kohsuke Yatoh, Lukas Zilka, Martin Sundermeyer, Matt Sharifi, Max Gubin, Nick Pezzotti, Nithi Gupta, Olivia Graham, Qi Wang, Sam Jaffee, Sebastian Millius, Shankar Kumar, Sina Hassani, Vishal Kumawat, and Yuanbo Zhang, Yunpeng Li, Yuxin Dai. We would also like to thank Xu Liu and David Petrou for their support.


1The feature will eventually be available in all apps with Gboard, but is currently unavailable for those in WebView