Categories
Misc

Startup Green Lights AI Analytics to Improve Traffic, Pedestrian Safety

For all the attention devoted to self-driving cars, there’s another, often-overlooked, aspect to transportation efficiency and safety: smarter roads. Derq, a startup operating out of Detroit and Dubai, has developed an AI system that can be installed on intersections and highways. Its AI edge appliance uses NVIDIA GPUs to process video and other data from Read article >

The post Startup Green Lights AI Analytics to Improve Traffic, Pedestrian Safety appeared first on The Official NVIDIA Blog.

Categories
Misc

GFN Thursday Brings More Support for GOG Version of ‘The Witcher’ Series

Fun fact: Thursday is named for Thor, the hammer-wielding Norse god associated with lightning and thunder. We like to think he’d endorse GFN Thursday as the best day of the week, too. This is a special Thursday, as we’re happy to share that our cloud-streaming service has added more support for GOG. As part of Read article >

The post GFN Thursday Brings More Support for GOG Version of ‘The Witcher’ Series appeared first on The Official NVIDIA Blog.

Categories
Misc

Omniverse Open Beta Now Available for Linux

The new launcher provides the latest news and updates about Omniverse, as well as the exchange where users can install and update applications and components like Omniverse Create, Kit, Cache, Drive and the Autodesk Maya Connector.

The NVIDIA Omniverse open beta expands to linux by releasing a linux-based launcher and applications.

The new launcher provides the latest Omniverse news and updates, as well as the exchange where users can install and update applications and components like Omniverse Create, Kit, Cache, Drive and the Autodesk Maya Connector.

The launcher also provides a quick way of installing the Nucleus servers through the collaboration tab. These serve as the hub for collaboration and maintain the live sync between compatible applications.

The following are available today for linux:

  • Omniverse Nucleus: At the core of Omniverse is a set of fundamental services that allow a variety of Omniverse-enabled client applications (Apps, Connectors, and others) to share and modify authoritative representations of virtual worlds.
  • Omniverse Cache: A simple service that can be used both on users’ workstations as well as within infrastructure to optimize data transfers between Nucleus and it’s clients. 
  • Omniverse Kit: A toolkit for building native Omniverse applications and microservices. It is built on a base framework that provides a wide variety of functionality through a set of light-weight plugins. 
  • Omniverse Create: An Omniverse app that allows users to assemble, light, simulate and render large scale scenes. It is built using NVIDIA Omniverse™ Kit. The Scene Description and in-memory model is based on Pixar’s USD. Omniverse Create takes advantage of the advanced workflows of USD like Layers, Variants, Instancing and more.
  • Autodesk Maya Connector: This feature offers a robust toolkit for Maya users to send and live sync their model data to an Omniverse Nucleus. Maya users get a first-class renderer through Omniverse View or Omniverse Kit, as well as the ability to open, edit and sync with any application supporting Omniverse Connect. 

Download the Omniverse launcher for linux today.

Categories
Misc

Guide To TensorLy: A Python Library For Tensor Learning

Guide To TensorLy: A Python Library For Tensor Learning submitted by /u/analyticsindiam
[visit reddit] [comments]
Categories
Offsites

A New Lens on Understanding Generalization in Deep Learning

Understanding generalization is one of the fundamental unsolved problems in deep learning. Why does optimizing a model on a finite set of training data lead to good performance on a held-out test set? This problem has been studied extensively in machine learning, with a rich history going back more than 50 years. There are now many mathematical tools that help researchers understand generalization in certain models. Unfortunately, most of these existing theories fail when applied to modern deep networks — they are both vacuous and non-predictive in realistic settings. This gap between theory and practice is largest for overparameterized models, which in theory have the capacity to overfit their train sets, but often do not in practice.

In “The Deep Bootstrap Framework: Good Online Learners are Good Offline Generalizers”, accepted at ICLR 2021, we present a new framework for approaching this problem by connecting generalization to the field of online optimization. In a typical setting, a model trains on a finite set of samples, which are reused for multiple epochs. But in online optimization, the model has access to an infinite stream of samples, and can be iteratively updated while processing this stream. In this work, we find that models that train quickly on infinite data are the same models that generalize well if they are instead trained on finite data. This connection brings new perspectives on design choices in practice, and lays a roadmap for understanding generalization from a theoretical perspective.

The Deep Bootstrap Framework
The main idea of the Deep Bootstrap framework is to compare the real world, where there is finite training data, to an “ideal world”, where there is infinite data. We define these as:

  • Real World (N, T): Train a model on N train samples from a distribution, for T minibatch stochastic gradient descent (SGD) steps, re-using the same N samples in multiple epochs, as usual. This corresponds to running SGD on the empirical loss (loss on training data), and is the standard training procedure in supervised learning.
  • Ideal World (T): Train the same model for T steps, but use fresh samples from the distribution in each SGD step. That is, we run the exact same training code (same optimizer, learning-rates, batch-size, etc.), but sample a fresh train set in each epoch instead of reusing samples. In this ideal world setting, with an effectively infinite “train set”, there is no difference between train error and test error.
Test soft-error for ideal world and real world during SGD iterations for ResNet-18 architecture. We see that the two errors are similar.

A priori, one might expect the real and ideal worlds may have nothing to do with each other, since in the real world the model sees a finite number of examples from the distribution while in the ideal world the model sees the whole distribution. But in practice, we found that the real and ideal models actually have similar test error.

In order to quantify this observation, we simulated an ideal world setting by creating a new dataset, which we call CIFAR-5m. We trained a generative model on CIFAR-10, which we then used to generate ~6 million images. The scale of the dataset was chosen to ensure that it is “virtually infinite” from the model’s perspective, so that the model never resamples the same data. That is, in the ideal world, the model sees an entirely fresh set of samples.

Samples from CIFAR-5m

The figure below presents the test error of several models, comparing their performance when trained on CIFAR-5m data in the real world setting (i.e., re-used data) and the ideal world (“fresh” data). The solid blue line shows a ResNet model in the real world, trained on 50K samples for 100 epochs with standard CIFAR-10 hyperparameters. The dashed blue line shows the corresponding model in the ideal world, trained on 5 million samples in a single pass. Surprisingly, these worlds have very similar test error — the model in some sense “doesn’t care” whether it sees re-used samples or fresh ones.

The real world model is trained on 50K samples for 100 epochs, and the ideal world model is trained on 5M samples for a single epoch. The lines show the test error vs. the number of SGD steps.

This also holds for other architectures, e.g., a Multi-Layer-Perceptron (red), a Vision Transformer (green), and across many other settings of architecture, optimizer, data distribution, and sample size. These experiments suggest a new perspective on generalization: models that optimize quickly (on infinite data), generalize well (on finite data). For example, the ResNet model generalizes better than the MLP model on finite data, but this is “because” it optimizes faster even on infinite data.

Understanding Generalization from Optimization Behavior
The key observation is that real world and ideal world models remain close, in test error, for all timesteps, until the real world converges (< 1% train error). Thus, one can study models in the real world by studying their corresponding behavior in the ideal world.

This means that the generalization of the model can be understood in terms of its optimization performance under two frameworks:

  1. Online Optimization: How fast the ideal world test error decreases
  2. Offline Optimization: How fast the real world train error converges

Thus, to study generalization, we can equivalently study the two terms above, which can be conceptually simpler, since they only involve optimization concerns. Based on this observation, good models and training procedures are those that (1) optimize quickly in the ideal world and (2) do not optimize too quickly in the real world.

All design choices in deep learning can be viewed through their effect on these two terms. For example, some advances like convolutions, skip-connections, and pretraining help primarily by accelerating ideal world optimization, while other advances like regularization and data-augmentation help primarily by decelerating real world optimization.

Applying the Deep Bootstrap Framework
Researchers can use the Deep Bootstrap framework to study and guide design choices in deep learning. The principle is: whenever one makes a change that affects generalization in the real world (the architecture, learning-rate, etc.), one should consider its effect on (1) the ideal world optimization of test error (faster is better) and (2) the real world optimization of train error (slower is better).

For example, pre-training is often used in practice to help generalization of models in small-data regimes. However, the reason that pre-training helps remains poorly understood. One can study this using the Deep Bootstrap framework by looking at the effect of pre-training on terms (1) and (2) above. We find that the primary effect of pre-training is to improve the ideal world optimization (1) — pre-training turns the network into a “fast learner” for online optimization. The improved generalization of pretrained models is thus almost exactly captured by their improved optimization in the ideal world. The figure below shows this for Vision-Transformers (ViT) trained on CIFAR-10, comparing training from scratch vs. pre-training on ImageNet.

Effect of pre-training — pre-trained ViTs optimize faster in the ideal world.

One can also study data-augmentation using this framework. Data-augmentation in the ideal world corresponds to augmenting each fresh sample once, as opposed to augmenting the same sample multiple times. This framework implies that good data-augmentations are those that (1) do not significantly harm ideal world optimization (i.e., augmented samples don’t look too “out of distribution”) or (2) inhibit real world optimization speed (so the real world takes longer to fit its train set).

The main benefit of data-augmentation is through the second term, prolonging the real world optimization time. As for the first term, some aggressive data augmentations (mixup/cutout) can actually harm the ideal world, but this effect is dwarfed by the second term.

Concluding Thoughts
The Deep Bootstrap framework provides a new lens on generalization and empirical phenomena in deep learning. We are excited to see it applied to understand other aspects of deep learning in the future. It is especially interesting that generalization can be characterized via purely optimization considerations, which is in contrast to many prevailing approaches in theory. Crucially, we consider both online and offline optimization, which are individually insufficient, but that together determine generalization.

The Deep Bootstrap framework can also shed light on why deep learning is fairly robust to many design choices: many kinds of architectures, loss functions, optimizers, normalizations, and activation functions can generalize well. This framework suggests a unifying principle: that essentially any choice that works well in the online optimization setting will also generalize well in the offline setting.

Finally, modern neural networks can be either overparameterized (e.g., large networks trained on small data tasks) or underparmeterized (e.g., OpenAI’s GPT-3, Google’s T5, or Facebook’s ResNeXt WSL). The Deep Bootstrap framework implies that online optimization is a crucial factor to success in both regimes.

Acknowledgements
We are thankful to our co-author, Behnam Neyshabur, for his great contributions to the paper and valuable feedback on the blog. We thank Boaz Barak, Chenyang Yuan, and Chiyuan Zhang for helpful comments on the blog and paper.

Categories
Misc

Is AI Important to Financial Services’ Future? New Survey Says You Can Bank on It

Financial services companies are challenged with defining and executing their AI strategy. AI solutions contribute to both the top and bottom line for firms by powering nearly every function, including customer service, cybersecurity, new account acquisition and regulatory compliance. Everyone from executives to data scientists are involved with determining how much to invest, the most Read article >

The post Is AI Important to Financial Services’ Future? New Survey Says You Can Bank on It appeared first on The Official NVIDIA Blog.

Categories
Offsites

Accelerating Neural Networks on Mobile and Web with Sparse Inference

On-device inference of neural networks enables a variety of real-time applications, like pose estimation and background blur, in a low-latency and privacy-conscious way. Using ML inference frameworks like TensorFlow Lite with XNNPACK ML acceleration library, engineers optimize their models to run on a variety of devices by finding a sweet spot between model size, inference speed and the quality of the predictions.

One way to optimize a model is through use of sparse neural networks [1, 2, 3], which have a significant fraction of their weights set to zero. In general, this is a desirable quality as it not only reduces the model size via compression, but also makes it possible to skip a significant fraction of multiply-add operations, thereby speeding up inference. Further, it is possible to increase the number of parameters in a model and then sparsify it to match the quality of the original model, while still benefiting from the accelerated inference. However, the use of this technique remains limited in production largely due to a lack of tools to sparsify popular convolutional architectures as well as insufficient support for running these operations on-device.

Today we announce the release of a set of new features for the XNNPACK acceleration library and TensorFlow Lite that enable efficient inference of sparse networks, along with guidelines on how to sparsify neural networks, with the goal of helping researchers develop their own sparse on-device models. Developed in collaboration with DeepMind, these tools power a new generation of live perception experiences, including hand tracking in MediaPipe and background features in Google Meet, accelerating inference speed from 1.2 to 2.4 times, while reducing the model size by half. In this post, we provide a technical overview of sparse neural networks — from inducing sparsity during training to on-device deployment — and offer some ideas on how researchers might create their own sparse models.

Comparison of the processing time for the dense (left) and sparse (right) models of the same quality for Google Meet background features. For readability, the processing time shown is the moving average across 100 frames.

Sparsifying a Neural Network
Many modern deep learning architectures, like MobileNet and EfficientNetLite, are primarily composed of depthwise convolutions with a small spatial kernel and 1×1 convolutions that linearly combine features from the input image. While such architectures have a number of potential targets for sparsification, including the full 2D convolutions that frequently occur at the beginning of many networks or the depthwise convolutions, it is the 1×1 convolutions that are the most expensive operators as measured by inference time. Because they account for over 65% of the total compute, they are an optimal target for sparsification.

Architecture Inference Time
MobileNet 85%
MobileNetV2 71%
MobileNetV3 71%
EfficientNet-Lite   66%
Comparison of inference time dedicated to 1×1 convolutions in % for modern mobile architectures.

In modern on-device inference engines, like XNNPACK, the implementation of 1×1 convolutions as well as other operations in the deep learning models rely on the HWC tensor layout, in which the tensor dimensions correspond to the height, width, and channel (e.g., red, green or blue) of the input image. This tensor configuration allows the inference engine to process the channels corresponding to each spatial location (i.e., each pixel of an image) in parallel. However, this ordering of the tensor is not a good fit for sparse inference because it sets the channel as the innermost dimension of the tensor and makes it more computationally expensive to access.

Our updates to XNNPACK enable it to detect if a model is sparse. If so, it switches from its standard dense inference mode to sparse inference mode, in which it employs a CHW (channel, height, width) tensor layout. This reordering of the tensor allows for an accelerated implementation of the sparse 1×1 convolution kernel for two reasons: 1) entire spatial slices of the tensor can be skipped when the corresponding channel weight is zero following a single condition check, instead of a per-pixel test; and 2) when the channel weight is non-zero, the computation can be made more efficient by loading neighbouring pixels into the same memory unit. This enables us to process multiple pixels simultaneously, while also performing each operation in parallel across several threads. Together these changes result in a speed-up of 1.8x to 2.3x when at least 80% of the weights are zero.

In order to avoid converting back and forth between the CHW tensor layout that is optimal for sparse inference and the standard HWC tensor layout after each operation, XNNPACK provides efficient implementations of several CNN operators in CHW layout.

Guidelines for Training Sparse Neural Networks
To create a sparse neural network, the guidelines included in this release suggest one start with a dense version and then gradually set a fraction of its weights to zero during training. This process is called pruning. Of the many available techniques for pruning, we recommend using magnitude pruning (available in the TF Model Optimization Toolkit) or the recently introduced RigL method. With a modest increase in training time, both of these can successfully sparsify deep learning models without degrading their quality. The resulting sparse models can be stored efficiently in a compressed format that reduces the size by a factor of two compared to their dense equivalent.

The quality of sparse networks is influenced by several hyperparameters, including training time, learning rate and schedules for pruning. The TF Pruning API provides an excellent example of how to select these, as well as some tips for training such models. We recommend running hyperparameter searches to find the sweet spot for your application.

Applications
We demonstrate that it is possible to sparsify classification tasks, dense segmentation (e.g., Meet background blur) and regression problems (MediaPipe Hands), which provides tangible benefits to users. For example, in the case of Google Meet, sparsification lowered the inference time of the model by 30%, which provided access to higher quality models for more users.

Model size comparisons for the dense and sparse models in Mb. The models have been stored in 16- and 32-bit floating-point formats.

The approach to sparsity described here works best with architectures based on inverted residual blocks, such as MobileNetV2, MobileNetV3 and EfficientNetLite. The degree of sparsity in a network influences both inference speed and quality. Starting from a dense network of a fixed capacity, we found modest performance gains even at 30% sparsity. With increased sparsity, the quality of the model remains relatively close to the dense baseline until reaching 70% sparsity, beyond which there is a more pronounced drop in accuracy. However, one can compensate for the reduced accuracy at 70% sparsity by increasing the size of the base network by 20%, which results in faster inference times without degrading the quality of the model. No further changes are required to run the sparsified models, because XNNPACK can recognize and automatically enable sparse inference.

Ablation studies of different sparsity levels with respect to inference time (the smaller the better) and the quality measured by the Intersection over Union (IoU) for predicted segmentation mask.

Sparsity as Automatic Alternative to Distillation
Background blur in Google Meet uses a segmentation model based on a modified MobileNetV3 backbone with attention blocks. We were able to speed up the model by 30% by applying a 70% sparsification, while preserving the quality of the foreground mask. We examined the predictions of the sparse and dense models on images from 17 geographic subregions, finding no significant difference, and released the details in the associated model card.

Similarly, MediaPipe Hands predicts hand landmarks in real-time on mobile and the web using a model based on the EfficientNetLite backbone. This backbone model was manually distilled from the large dense model, which is a computationally expensive, iterative process. Using the sparse version of the dense model instead of distilled one, we were able to maintain the same inference speed but without the labor intensive process of distilling from a dense model. Compared with the dense model the sparse model improved the inference by a factor of two, achieving the identical landmark quality as the distilled model. In a sense, sparsification can be thought of as an automatic approach to unstructured model distillation, which can improve model performance without extensive manual effort. We evaluated the sparse model on the geodiverse dataset and made the model card publicly available.

Comparison of execution time for the dense (left), distilled (middle) and sparse (right) models of the same quality. Processing time of the dense model is 2x larger than sparse or distilled models. The distilled model is taken from the official MediPipe solution. The dense and sparse web demos are publicly available.

Future work
We find sparsification to be a simple yet powerful technique for improving CPU inference of neural networks. Sparse inference allows engineers to run larger models without incurring a significant performance or size overhead and offers a promising new direction for research. We are continuing to extend XNNPACK with wider support for operations in CHW layout and are exploring how it might be combined with other optimization techniques like quantization. We are excited to see what you might build with this technology!

Acknowledgments
Special thanks to all who worked on this project: Karthik Raveendran, Erich Elsen, Tingbo Hou‎, Trevor Gale, Siargey Pisarchyk, Yury Kartynnik, Yunlu Li, Utku Evci, Matsvei Zhdanovich, Sebastian Jansson, Stéphane Hulaud, Michael Hays, Juhyun Lee, Fan Zhang, Chuo-Ling Chang, Gregory Karpiak, Tyler Mullen, Jiuqiang Tang, Ming Guang Yong, Igor Kibalchich, and Matthias Grundmann.

Categories
Misc

Few experiments with neuro networks and sin(x) function. How NN fails in extrapolating

Few experiments with neuro networks and sin(x) function. How NN fails in extrapolating submitted by /u/Denis_Vo
[visit reddit] [comments]
Categories
Misc

NVIDIA AI Enterprise – Optimized, Certified and Supported on VMware vSphere

NVIDIA AI Enterprise is a suite of AI software, certified to run on VMware vSphere 7 Update 2 with NVIDIA-Certified volume servers.  It includes key enabling technologies and software from NVIDIA for rapid deployment, management and scaling of AI workloads in the virtualized data center running on VMware vSphere. 

NVIDIA AI Enterprise is a suite of AI software, certified to run on VMware vSphere 7 Update 2 with NVIDIA-Certified volume servers. It includes key enabling technologies and software from NVIDIA for rapid deployment, management and scaling of AI workloads in the virtualized data center running on VMware vSphere. The NVIDIA AI Enterprise suite also enables IT Administrators, Data Scientists, and AI Researchers to quickly run NVIDIA AI applications and libraries optimized for GPU acceleration by reducing deployment time and ensuring reliable performance.    

NVIDIA AI Enterprise suite is licensed and supported by NVIDIA. After the joint announcement at VMworld in September 2020, NVIDIA and VMware have continued work to improve the integration between their joint offerings. NVIDIA and VMware are committed to continued collaboration to tightly couple VMware vSphere with the NVIDIA AI Enterprise suite. This article discusses the new features introduced with VMware vSphere 7 Update 2 release and the new NVIDIA AI Enterprise software suite. 

The introduction of NVIDIA RDMA capabilities into vSphere for NVIDIA virtualized GPU (vGPU) allows deep learning training to scale out to multiple nodes with near bare metal performance for even the largest deep learning training workloads.  

RDMA technology is featured in NVIDIA ConnectX SmartNICs and BlueField DPUs and improves the bandwidth and latency when moving data directly between a network interface card (NIC) and GPU memory.   

A picture containing text, indoor, several

Description automatically generated

IT administrators can use the tools they are familiar with, like VMware vCenter, to provision multiple nodes as VMs. These VMs can be configured to use NVIDIA networking and vGPU resources for RDMA.   

Graphical user interface, text, application, email

Description automatically generated

VMware’s integration with RDMA over Converged Ethernet (RoCE) results in AI and ML capabilities that are accelerated faster than ever before.  vSphere 7 Update 2 with NVIDIA AI Enterprise software supports RDMA with ATS capabilities on Intel CPUs further optimizing the GPUDirect bandwidth between NIC and GPU so that throughput is not limited by PCIe bus speeds. This means that a data scientist can iterate on new data and re-train many more times in a day, dramatically increasing their productivity.   

Now let’s look at the new VMware features which have further enabled Deep Learning inferencing workloads. vSphere 7 Update 2 supports the latest GPU Ampere architecture such as the NVIDIA A100 GPU. This GPU can be configured to use Multi-Instance GPU (MIG). This type of GPU partitioning can be particularly beneficial for inferencing workloads that do not fully saturate the GPUs compute capacity and for uses cases which require low latency response and error isolation. The graph below illustrates the performance of natural language inference using virtualized GPUs enabled with MIG compared to virtualized CPU as well as bare metal. 

Let’s look at a use case example of how a single NVIDIA A100 configured with MIG mode enabled can be used for multiple inferencing workloads with VMware vSphere. NVIDIA Triton Inference Server is an AI application framework included in the NVIDIA AI Enterprise suite. Available as a Docker container, it integrates with Kubernetes for orchestration and auto-scaling. This solution allows front end client applications to submit inferencing requests from the AI inference cluster and can service models within the AI model repository.   

Let’s look further at this use case, for example multiple end users or departments submit inference request to perform object detection on satellite imagery. Within the AI model repository, there are pre-trained object detection models which detect the presence of multiple objects in the satellite imagery, such as buildings, trees, fire hydrants or well pads. A single NVIDIA A100 GPU can be used for servicing the multiple inferencing requests by leveraging MIG spatial partitioning, thereby optimizing the utilization of a valuable and powerful GPU resource within the enterprise.  The graph below illustrates the performance of ResNet-50 object detection inference using virtualized GPUs with MIG enabled compared to virtualized CPU only as well as bare metal.   

Using Triton Inference Server, with added MIG support in vSphere 7.0 U2, the NVIDIA A100 – 40GB GPU can be partitioned up to 7 GPU slices, each slice or instance has its own dedicated compute resources that run in parallel with predictable throughput and latency. IT administrators use vCenter to assign a VM a single MIG partition. Read VMware’s technical blog post for additional details, “Multiple Machine Learning Workloads Using GPUs: New Features in vSphere 7 Update 2“.

As enterprises move toward AI and cloud computing, a new data center architecture is needed to enable both existing and modern. Accelerated servers can be added to the core enterprise data center and managed with standard tools like VMware vCenter. As a result of NVIDIAs close partnership, VMware has brought in new features to vSphere 7.0 U2 which provides the highest quality of low latency response ML/AI applications backed by vGPU in the enterprise. 

Categories
Misc

NVIDIA CEO Jensen Huang to Host AI Pioneers Yoshua Bengio, Geoffrey Hinton and Yann LeCun, and Others, at GTC21

Online Conference to Feature Jensen Huang Keynote and 1,300 Talks from Leaders in Data Center, Networking, Graphics and Autonomous VehiclesSANTA CLARA, Calif., March 09, 2021 (GLOBE NEWSWIRE) …