Categories
Misc

Is Tensor Flow Compilable to a binary?

If I install all the external dependencies of Tensor Flow such as the Bazel, make, on my linux machine, and build Tensor Flow from source, would I be able to arrive at a binary code I can feed into some server machine to just run?

I am also asking whether the python API calls a bunch of dynamically linked libraries, and then somehow the have a model.build() command or a model.compile() command that produces a binary in the directory in my linux machine?

submitted by /u/GingerGengar123
[visit reddit] [comments]

Categories
Misc

tf.distribute.multiWorkerMirroredStrategy and data sharding

I understand the simple tf.distribute.mirroredStrategy(), it’s actually pretty simple. I’m hoping to scale a large problem across multiple computers so I’m trying to learn how to use multiWorkerMirrorredStrategy() but I have not found a good example yet.

My understanding is that I would write one python script and distribute it across the machines. Then I define the roles of each machine via the environment variable TF_CONFIG. I create the strategy and then do something like:

mystrategy = tf.distribute.multiWorkerMirroredStrategy() with mystrategy.scope(): model = buildModel() model.compile() model.fit(x_train, y_train) 

That’s all very straightforward. My question is about the data. This code is executed on all nodes. Is each node supposed to parse TF_CONFIG and load its own subset of data? Does just the chief load all the data and then the scope block parses out shards? Or does each node load all the data?

submitted by /u/Simusid
[visit reddit] [comments]

Categories
Offsites

3D Scene Understanding with TensorFlow 3D

The growing ubiquity of 3D sensors (e.g., Lidar, depth sensing cameras and radar) over the last few years has created a need for scene understanding technology that can process the data these devices capture. Such technology can enable machine learning (ML) systems that use these sensors, like autonomous cars and robots, to navigate and operate in the real world, and can create an improved augmented reality experience on mobile devices. The field of computer vision has recently begun making good progress in 3D scene understanding, including models for mobile 3D object detection, transparent object detection, and more, but entry to the field can be challenging due to the limited availability tools and resources that can be applied to 3D data.

In order to further improve 3D scene understanding and reduce barriers to entry for interested researchers, we are releasing TensorFlow 3D (TF 3D), a highly modular and efficient library that is designed to bring 3D deep learning capabilities into TensorFlow. TF 3D provides a set of popular operations, loss functions, data processing tools, models and metrics that enables the broader research community to develop, train and deploy state-of-the-art 3D scene understanding models.

TF 3D contains training and evaluation pipelines for state-of-the-art 3D semantic segmentation, 3D object detection and 3D instance segmentation, with support for distributed training. It also enables other potential applications like 3D object shape prediction, point cloud registration and point cloud densification. In addition, it offers a unified dataset specification and configuration for training and evaluation of the standard 3D scene understanding datasets. It currently supports the Waymo Open, ScanNet, and Rio datasets. However, users can freely convert other popular datasets, such as NuScenes and Kitti, into a similar format and use them in the pre-existing or custom created pipelines, and can leverage TF 3D for a wide variety of 3D deep learning research and applications, from quickly prototyping and trying new ideas to deploying a real-time inference system.

An example output of the 3D object detection model in TF 3D on a frame from Waymo Open Dataset is shown on the left. An example output of the 3D instance segmentation model on a scene from ScanNet dataset is shown on the right.

Here, we will present the efficient and configurable sparse convolutional backbone that is provided in TF 3D, which is the key to achieving state-of-the-art results on various 3D scene understanding tasks. Furthermore, we will go over each of the three pipelines that TF 3D currently supports: 3D semantic segmentation, 3D object detection and 3D instance segmentation.

3D Sparse Convolutional Network
The 3D data captured by sensors often consists of a scene that contains a set of objects of interest (e.g. cars, pedestrians, etc.) surrounded mostly by open space, which is of limited (or no) interest. As such, 3D data is inherently sparse. In such an environment, standard implementation of convolutions would be computationally intensive and consume a large amount of memory. So, in TF 3D we use submanifold sparse convolution and pooling operations, which are designed to process 3D sparse data more efficiently. Sparse convolutional models are core to the state-of-the-art methods applied in most outdoor self-driving (e.g. Waymo, NuScenes) and indoor benchmarks (e.g. ScanNet).

We also use various CUDA techniques to speed up the computation (e.g., hashing, partitioning / caching the filter in shared memory, and using bit operations). Experiments on the Waymo Open dataset shows that this implementation is around 20x faster than a well-designed implementation with pre-existing TensorFlow operations.

TF 3D then uses the 3D submanifold sparse U-Net architecture to extract a feature for each voxel. The U-Net architecture has proven to be effective by letting the network extract both coarse and fine features and combining them to make the predictions. The U-Net network consists of three modules, an encoder, a bottleneck, and a decoder, each of which consists of a number of sparse convolution blocks with possible pooling or un-pooling operations.

A 3D sparse voxel U-Net architecture. Note that a horizontal arrow takes in the voxel features and applies a submanifold sparse convolution to it. An arrow that is moving down performs a submanifold sparse pooling. An arrow that is moving up will gather back the pooled features, concatenate them with the features coming from the horizontal arrow, and perform a submanifold sparse convolution on the concatenated features.

The sparse convolutional network described above is the backbone for the 3D scene understanding pipelines that are offered in TF 3D. Each of the models described below uses this backbone network to extract features for the sparse voxels, and then adds one or multiple additional prediction heads to infer the task of interest. The user can configure the U-Net network by changing the number of encoder / decoder layers and the number of convolutions in each layer, and by modifying the convolution filter sizes, which enables a wide range of speed / accuracy tradeoffs to be explored through the different backbone configurations

3D Semantic Segmentation
The 3D semantic segmentation model has only one output head for predicting the per-voxel semantic scores, which are mapped back to points to predict a semantic label per point.

3D semantic segmentation of an indoor scene from ScanNet dataset.

3D Instance Segmentation
In 3D instance segmentation, in addition to predicting semantics, the goal is to group the voxels that belong to the same object together. The 3D instance segmentation algorithm used in TF 3D is based on our previous work on 2D image segmentation using deep metric learning. The model predicts a per-voxel instance embedding vector as well as a semantic score for each voxel. The instance embedding vectors map the voxels to an embedding space where voxels that correspond to the same object instance are close together, while those that correspond to different objects are far apart. In this case, the input is a point cloud instead of an image, and it uses a 3D sparse network instead of a 2D image network. At inference time, a greedy algorithm picks one instance seed at a time, and uses the distance between the voxel embeddings to group them into segments.

3D Object Detection
The 3D object detection model predicts per-voxel size, center, and rotation matrices and the object semantic scores. At inference time, a box proposal mechanism is used to reduce the hundreds of thousands of per-voxel box predictions into a few accurate box proposals, and then at training time, box prediction and classification losses are applied to per-voxel predictions. We apply a Huber loss on the distance between predicted and the ground-truth box corners. Since the function that estimates the box corners from its size, center and rotation matrix is differentiable, the loss will automatically propagate back to those predicted object properties. We use a dynamic box classification loss that classifies a box that strongly overlaps with the ground-truth as positive and classifies the non-overlapping boxes as negative.

Our 3D object detection results on ScanNet dataset.

In our recent paper, “DOPS: Learning to Detect 3D Objects and Predict their 3D Shapes”, we describe in detail the single-stage weakly supervised learning algorithm used for object detection in TF 3D. In addition, in a follow up work, we extended the 3D object detection model to leverage temporal information by proposing a sparse LSTM-based multi-frame model. We go on to show that this temporal model outperforms the frame-by-frame approach by 7.5% in the Waymo Open dataset.

The 3D object detection and shape prediction model introduced in the DOPS paper. A 3D sparse U-Net is used to extract a feature vector for each voxel. The object detection module uses these features to propose 3D boxes and semantic scores. At the same time, the other branch of the network predicts a shape embedding that is used to output a mesh for each object.

Ready to Get Started?
We’ve certainly found this codebase to be useful for our 3D computer vision projects, and we hope that you will as well. Contributions to the codebase are welcome and please stay tuned for our own further updates to the framework. To get started please visit our github repository.

Acknowledgements
The release of the TensorFlow 3D codebase and model has been the result of widespread collaboration among Google researchers with feedback and testing from product groups. In particular we want to highlight the core contributions by Alireza Fathi and Rui Huang (work performed while at Google), with special additional thanks to Guangda Lai, Abhijit Kundu, Pei Sun, Thomas Funkhouser, David Ross, Caroline Pantofaru, Johanna Wald, Angela Dai and Matthias Niessner.

Categories
Misc

Tutorial: Accelerating IO in the Modern Data Center: Computing and IO Management

This is the third post in the Explaining Magnum IO series, which has the goal of describing the architecture, components, and benefits of Magnum IO, the IO subsystem of the modern data center.

This is the third post in the Explaining Magnum IO series, which has the goal of describing the architecture, components, and benefits of Magnum IO, the IO subsystem of the modern data center.

The first post in this series introduced the Magnum IO architecture; positioned it in the broader context of CUDA, CUDA-X, and vertical application domains; and listed the four major components of the architecture. The second post delved deep into the Network IO components of Magnum IO. This third post covers two shorter areas: computing that occurs in the network adapter or switch and IO management. Whether your interests are in InfiniBand or Ethernet, NVIDIA Mellanox solutions have you covered.

Read more >

Categories
Misc

Tutorial: Building Image Segmentation Faster Using Jupyter Notebooks from NGC

The AI containers and models on the NGC Catalog are tuned, tested, and optimized to extract maximum performance from your existing GPU infrastructure.

Image segmentation is the process of partitioning a digital image into multiple segments by changing the representation of an image into something that is more meaningful and easier to analyze. Image segmentation can be used in a variety of domains such as manufacturing to identify defective parts, in medical imaging to detect early onset of diseases, in autonomous driving to detect pedestrians, and more.

However, building, training, and optimizing these models can be complex and quite time consuming. To achieve a state-of-the-art model, you need to set up the right environment, train with the correct hyperparameters, and optimize it to achieve the desired accuracy. Data scientists and developers usually end up spending a considerable amount of looking for the right tools and setting up the environments for their models, which is why we built the NGC Catalog.

A hub for cloud-native, GPU-optimized AI and HPC applications and tools that provides faster access to performance-optimized containers, shortens time-to-solution with pretrained models and provides industry specific software development kits to build end-to-end AI solutions. The catalog hosts a diverse set of assets that can be used for a variety of applications and use cases ranging from computer vision and speech recognition to recommendation systems.

Read more >

Categories
Misc

Webinar: Limitless Extended Reality with NVIDIA CloudXR 2.0

Learn how NVIDIA CloudXR can be used to deliver limitless virtual and augmented reality over networks (including 5G) to low cost, low-powered headsets and devices—while maintaining the high-quality experience traditionally reserved for high-end headsets that are plugged into high-performance computers.

Many people believed delivering extended reality (XR) experiences from cloud computing systems was impossible until now. Join our webinar to learn how NVIDIA CloudXR can be used to deliver limitless virtual and augmented reality over networks (including 5G) to low cost, low-powered headsets and devices—while maintaining the high-quality experience traditionally reserved for high-end headsets that are plugged into high-performance computers. CloudXR lifts the limits on developers, enabling them to focus their imagination on content, rather than spending huge amounts of time optimizing the application for low cost and low-powered headsets.

Date: Thursday, February 18, 2021
Time: 8:00am – 9:00am PT | 4:00pm – 5:00pm GMT
Duration: 1 hour

By joining this webinar, you will:

  • Learn more about delivering CloudXR
  • Get a deeper understanding of the new experiences enabled by CloudXR 2.0
  • See how CloudXR frees developers to focus on building better experiences
  • Get your questions answered in a live Q&A session

Register >

Categories
Misc

Tutorial: Reducing Temporal Noise on Images with NVIDIA VPI on NVIDIA Jetson Embedded Computers

The NVIDIA Vision Programming Interface (VPI) is a software library that provides a set of computer-vision and image-processing algorithms.

In this post, we show you how to run the Temporal Noise Reduction (TNR) sample application on the Jetson product family. For more information, see the VPI – Vision Programming Interface documentation.

Read more >

Categories
Misc

NVIDIA DLSS Plugin and Reflex Now Available for Unreal Engine 4.26

Unreal Engine 4 (UE4) developers can now access DLSS as a plugin for Unreal Engine 4.26. Additionally, NVIDIA Reflex is now available as a feature in mainline Unreal Engine 4.26. The NVIDIA RTX UE4 4.25 and 4.26 branches have also received updates.

Leveling up your games with the cutting-edge technologies found in the biggest blockbusters just got a lot simpler. As of today, Unreal Engine 4 (UE4) developers can now access DLSS as a plugin for Unreal Engine 4.26. Additionally, NVIDIA Reflex is now available as a feature in mainline Unreal Engine 4.26. The NVIDIA RTX UE4 4.25 and 4.26 branches have also received updates. 

NVIDIA DLSS Plugin for UE4

DLSS is a deep learning super resolution network that boosts frame rates by rendering fewer pixels and then using AI to construct sharp, higher resolution images. Dedicated computational units on NVIDIA RTX GPUs called Tensor Cores accelerate the AI calculations, allowing the algorithm to run in real-time. DLSS pairs perfectly with computationally intensive rendering algorithms such as real-time ray tracing. The technology has been used to increase performance in a broad range of games, including Fortnite, Cyperpunk 2077, Minecraft, Call of Duty: Black Ops Cold War, and Death Stranding.

DLSS is now available for the first time for mainline UE4 as a plugin, compatible with UE4.26.  Enjoy great scaling across all GeForce RTX GPUs and resolutions, including the new ultra performance mode for 8K gaming.

Access the NVIDIA DLSS plugin for UE4 here.

NVIDIA Reflex

NVIDIA Reflex is a toolkit to measure, debug and improve CPU+GPU latency in competitive multiplayer games.  It is now available to all developers through UE4 mainline.   

In addition to UE4 titles such as Fortnite, developers such as Activision Blizzard, Ubisoft, Riot, Bungie and Respawn are using Reflex now. Pull the latest change list from UE4 mainline today to improve system latency in your game.

Updates to NVIDIA RTX UE 4.25 and 4.26.1

The new NVIDIA UE 4.25 and 4.26.1 Branches offer all of the benefits of mainline UE4.25 and UE4.26.1, while providing some additional features: 

  • Fixes for instanced static mesh culling (including foliage)
  • Option to mark ray tracing data as high-priority to the memory manager to avoid poor placement of data in memory overflow conditions
  • Introduced threshold for recomputing ray tracing representation for landscape tiles
  • Compatibility fixes for building certain targets
  • Inexact occlusion test support for shadows and ambient occlusion

Download the latest branch here to benefit from all of NVIDIA’s cutting-edge features.

Pushing PC Gaming to New Levels

With our work with ray tracing, DLSS and Reflex, NVIDIA is revolutionizing what is possible from a GPU.  

  • Ray tracing and DLSS are supported in the biggest PC game launch ever – Cyberpunk 2077.
  • Ray tracing and DLSS is supported  in the most popular game – Minecraft.
  • Reflex has been adopted by 7 of the Top 10 competitive shooters – Call of Duty: Warzone, Call of Duty: Black Ops Cold War, Valorant, Fortnite, Apex Legends, Overwatch and Rainbow Six: Siege

Now all Unreal Engine developers can easily use these same technologies in your games, thanks to UE4 integration.

Learn more about other NVIDIA developer tools and SDKs available for game developers here.

Categories
Offsites

Uncovering Unknown Unknowns in Machine Learning

The performance of machine learning (ML) models depends both on the learning algorithms, as well as the data used for training and evaluation. The role of the algorithms is well studied and the focus of a multitude of challenges, such as SQuAD, GLUE, ImageNet, and many others. In addition, there have been efforts to also improve the data, including a series of workshops addressing issues for ML evaluation. In contrast, research and challenges that focus on the data used for evaluation of ML models are not commonplace. Furthermore, many evaluation datasets contain items that are easy to evaluate, e.g., photos with a subject that is easy to identify, and thus they miss the natural ambiguity of real world context. The absence of ambiguous real-world examples in evaluation undermines the ability to reliably test machine learning performance, which makes ML models prone to develop “weak spots”, i.e., classes of examples that are difficult or impossible for a model to accurately evaluate, because that class of examples is missing from the evaluation set.

To address the problem of identifying these weaknesses in ML models, we recently launched the Crowdsourcing Adverse Test Sets for Machine Learning (CATS4ML) Data Challenge at HCOMP 2020 (open until 30 April, 2021 to researchers and developers worldwide). The goal of the challenge is to raise the bar in ML evaluation sets and to find as many examples as possible that are confusing or otherwise problematic for algorithms to process. CATS4ML relies on people’s abilities and intuition to spot new data examples about which machine learning is confident, but actually misclassifies.

What are ML “Weak Spots”?
There are two categories of weak spots: known unknowns and unknown unknowns. Known unknowns are examples for which a model is unsure about the correct classification. The research community continues to study this in a field known as active learning, and has found the solution to be, in very general terms, to interactively solicit new labels from people on uncertain examples. For example, if a model is not certain whether or not the subject of a photo is a cat, a person is asked to verify; but if the system is certain, a person is not asked. While there is room for improvement in this area, what is comforting is that the confidence of the model is correlated with its performance, i.e., one can see what the model doesn’t know.

Unknown unknowns, on the other hand, are examples where a model is confident about its answer, but is actually wrong. Efforts to proactively discover unknown unknowns (e.g., Attenberg 2015 and Crawford 2019) have helped uncover a multitude of unintended machine behaviours. In contrast to such approaches for the discovery of unknown unknowns, generative adversarial networks (GANs) generate unknown unknowns for image recognition models in the form of optical illusions for computers that cause deep learning models to make mistakes beyond human perception. While GANs uncover model exploits in the event of an intentional manipulation, real-world examples can better highlight a model’s failures in its day-to-day performance. These real-world examples are the unknown unknowns of interest to CATS4ML — the challenge aims to gather unmanipulated examples that humans can reliably interpret but on which many ML models would confidently disagree.

Example illustrating how optical illusions for computers caused by adversarial noise help discover machine manipulated unknown unknowns for ML models (based on Brown 2018).

First Edition of CATS4ML Data Challenge: Open Images Dataset
The CATS4ML Data Challenge focuses on visual recognition, using images and labels from the Open Images Dataset. The target images for the challenge are selected from the Open Images Dataset along with a set of 24 target labels from the same dataset. The challenge participants are invited to invent new and creative ways to explore this existing publicly available dataset and, focussed on a list of pre-selected target labels, discover examples of unknown unknowns for ML models.

Examples from the Open Images Dataset as possible unknown unknowns for ML models.

CATS4ML is a complementary effort to FAIR’s recently introduced DynaBench research platform for dynamic data collection. Where DynaBench tackles issues with static benchmarks using ML models with humans in the loop, CATS4ML focuses on improving evaluation datasets for ML by encouraging the exploration of existing ML benchmarks for adverse examples that can be unknown unknowns. The results will help detect and avoid future errors, and also will give insights to model explainability.

In this way, CATS4ML aims to raise greater awareness of the problem by providing dataset resources that developers can use to uncover the weak spots of their algorithms. This will also inform researchers on how to create benchmark datasets for machine learning that are more balanced, diverse and socially aware.

Get Involved
We invite the global community of ML researchers and practitioners to join us in the effort of discovering interesting, difficult examples from the Open Images Dataset. Register on the challenge website, download the target images and labeled data, contribute the images you discover and join the competition for the winning participant!

To score points in this competition, participants should submit a set of image-label pairs that will be confirmed by human-in-the-loop raters, whose votes should be in disagreement with the average machine score for the label over a number of machine learning models.

An example of how a submitted image can score points. The same image can score as a false positive (Left) and as a false negative (Right) with two different labels. In both cases the human verification is in disagreement with the machine score. Participants score on submitted image-label pairs, which means that one and the same image can be an example of an ML unknown unknown for different labels.

The challenge is open until 30 April, 2021 to researchers and developers worldwide. To learn more about CATS4ML and how to join, please review these slides and visit the challenge website.

Acknowledgements
The release of the CATS4ML Data Challenge has been possible thanks to the hard work of a lot of people including, but not limited to, the following (in alphabetical order of last name): Osman Aka, Ken Burke, Tulsee Doshi, Mig Gerard, Victor Gomes, Shahab Kamali, Igor Karpov, Devi Krishna, Daphne Luong, Carey Radebaugh, Jamie Taylor, Nithum Thain, Kenny Wibowo, Ka Wong, and Tong Zhou.

Categories
Misc

Fetching AI Data: Researchers Get Leg Up on Teaching Dogs New Tricks with NVIDIA Jetson

AI is going to the dogs. Literally. Colorado State University researchers Jason Stock and Tom Cavey have published a paper on an AI system to recognize and reward dogs for responding to commands. The graduate students in computer science trained image classification networks to determine whether a dog is sitting, standing or lying. If a Read article >

The post Fetching AI Data: Researchers Get Leg Up on Teaching Dogs New Tricks with NVIDIA Jetson appeared first on The Official NVIDIA Blog.