Categories
Misc

Using TF timeseries_dataset_from_array with more samples

Using TF timeseries_dataset_from_array with more samples

I have to handle a huge amount of samples, where each sample contains unique time series. The goal is to feed this data into the Tensorflow LSTM model and predict some features. I have created the tf timeseries_dataset_from_arraygenerator function to feed the data to the TF model, but I haven’t figured out how to create a generator function when I have multiple samples. If I use the usual pipeline, tf timeseries_dataset_from_arrayoverlap the time series of two individual samples.

Does anyone have an idea how to effectively pass a time series of multiple samples to the TF model?

E.g. the Human Activity Recognition Dataset is one such dataset where each person has a separate long, time series, and each user’s time series can be further parsed with the SLIDING/ROLLING WINDOS-like timeseries_dataset_from_arrayfunction.

Here is a simpler example:

I want to use timeseries_dataset_from_arrayto generate samples for the TF model. Example: sample 1 where column 0 has 0, sample 2 starts where column 0 has 100. Here is a simpler example:

https://preview.redd.it/0hzmsm9tt6481.png?width=786&format=png&auto=webp&s=d82f1fdbcb9dff8337de274bf6325ebac52c7e81

I want to get 3D data (samples, timesteps, features) without overlap.For example (6,2,7) Like this:

https://preview.redd.it/t9cfot0vt6481.png?width=865&format=png&auto=webp&s=0524e0776a3f1938c6178ee5b16dda80885ed44d

Here is the sample code:

from tensorflow.keras.preprocessing import timeseries_dataset_from_array import numpy as np x = np.array([[0,1,2,3,4,5,6], [0,11,12,13,14,15,16], [0,21,22,23,24,25,26], [0,31,32,33,34,35,36], [0,41,42,43,44,45,46] ]) xx = np.concatenate((x, x+100), axis=0)#.reshape(2,5,6) sequence_length=2 stride=1 rate=1 input_dataset = timeseries_dataset_from_array(xx, None, sequence_length, sequence_stride=stride, sampling_rate=rate) x_test = np.concatenate([x for x in input_dataset], axis=0) 

submitted by /u/korosig
[visit reddit] [comments]

Categories
Misc

Silicon Express Lanes: AI, GPUs Pave Fast Routes for Chip Designers

AI can design chips no human could, said Bill Dally in a virtual keynote today at the Design Automation Conference (DAC), one of the world’s largest gatherings of semiconductor engineers. The chief scientist of NVIDIA discussed research in accelerated computing and machine learning that’s making chips smaller, faster and better. “Our work shows you can Read article >

The post Silicon Express Lanes: AI, GPUs Pave Fast Routes for Chip Designers appeared first on The Official NVIDIA Blog.

Categories
Misc

GPU Operator 1.9 Adds Support for DGX A100 with DGX OS

GPU Operator 1.9 includes support for NVIDIA DGX A100 systems and streamlined installation processes.GPU Operator 1.9 includes support NVIDIA DGX A100 systems with DGX OS and streamlined installation processes. GPU Operator 1.9 includes support for NVIDIA DGX A100 systems and streamlined installation processes.

NVIDIA GPU Operator allows organizations to easily scale NVIDIA GPUs on Kubernetes. 

By simplifying the deployment and management of GPUs with Kubernetes, the GPU Operator enables infrastructure teams to scale GPU applications error-free, within minutes, automatically. 

GPU Operator 1.9 is now available and includes several key features, among other updates, that allow users to get started faster and maintain uninterrupted service. 

GPU Operator 1.9 includes:

  • Support for NVIDIA DGX A100 systems with DGX OS
  • Streamlined installation process

Support for DGX A100 with DGX OS

With 1.9, the GPU Operator automatically deploys the software required for initializing the fabric on NVIDIA NVSwitch systems, including the DGX A100 when used with DGX OS. Once initialized, all GPUs can communicate with one another at full NVLink bandwidth to create an end-to-end scalable computing platform. 

The DGX A100 features the world’s most advanced accelerator, enabling enterprises to consolidate training, inference, and analytics into a unified, easy-to-deploy AI infrastructure. And now, with GPU Operator support, organizations can take their applications from training to scale with the world’s most advanced systems.  

Streamlined installation process

With previous versions of GPU Operator, organizations using GPU Operator with OpenShift needed to apply additional entitlements from Red Hat in order to successfully use the GPU Operator. As entitlement keys expired, users would need to re-apply them to ensure that their workflow was not interrupted. 

GPU Operator 1.9 now supports entitlement-free driver containers for OpenShift. This is done by leveraging Driver-Toolkit images provided by RedHat with necessary kernel packages preinstalled for building NVIDIA kernel modules. Users no longer need to ensure that valid certificates with an RHEL subscription are always applied for running GPU Operator. More importantly for disconnected clusters, it eliminates dependencies on private package repositories.

Version 1.9 also includes support for preinstalled drivers with the MIG Manager, support for preinstalled MOFED to use GPUDirect RDMA, automatic detection of container runtime, and automatic disabling of NOUVEAU – all designed to make it easier for users to get started and continue GPU-accelerated Kubernetes. 

Additionally, GPU Operator 1.9 automatically detects the container runtime installed on the worker node. There is no need to specify the container runtime at install time.

GPU Operator 1.9:

helm install --wait --generate-name nvidia/gpu-operator 

GPU Operator 1.8 and earlier:

helm install --wait --generate-name nvidia/gpu-operator --set operator.defaultRuntime=containerd

GPU Operator requires Nouveau to be disabled. With previous GPU Operator versions, the K8s admin had to disable Nouveau as documented here. GPU Operator 1.9 automatically detects if Nouveau is enabled and disables it for you.

GPU Operator Resources

The following resources are available for using NVIDIA GPU Operator: 

The NVIDIA GPU Operator is a key component to many edge computing solutions. Learn more about NVIDIA solutions for edge computing.

Categories
Misc

Majority Report: 2022 Predictions on How AI Will Impact Global Industries

There’s an old axiom that the best businesses thrive during periods of uncertainty. No doubt, that will be tested to the limits as 2022 portends upheaval on a grand scale. Pandemic-related supply chain disruptions are affecting everything from production of cars and electronics to toys and toilet paper. At the same time, global food prices Read article >

The post Majority Report: 2022 Predictions on How AI Will Impact Global Industries appeared first on The Official NVIDIA Blog.

Categories
Offsites

Improving Vision Transformer Efficiency and Accuracy by Learning to Tokenize

Transformer models consistently obtain state-of-the-art results in computer vision tasks, including object detection and video classification. In contrast to standard convolutional approaches that process images pixel-by-pixel, the Vision Transformers (ViT) treat an image as a sequence of patch tokens (i.e., a smaller part, or “patch”, of an image made up of multiple pixels). This means that at every layer, a ViT model recombines and processes patch tokens based on relations between each pair of tokens, using multi-head self-attention. In doing so, ViT models have the capability to construct a global representation of the entire image.

At the input-level, the tokens are formed by uniformly splitting the image into multiple segments, e.g., splitting an image that is 512 by 512 pixels into patches that are 16 by 16 pixels. At the intermediate levels, the outputs from the previous layer become the tokens for the next layer. In the case of videos, video ‘tubelets’ such as 16x16x2 video segments (16×16 images over 2 frames) become tokens. The quality and quantity of the visual tokens decide the overall quality of the Vision Transformer.

The main challenge in many Vision Transformer architectures is that they often require too many tokens to obtain reasonable results. Even with 16×16 patch tokenization, for instance, a single 512×512 image corresponds to 1024 tokens. For videos with multiple frames, that results in tens of thousands of tokens needing to be processed at every layer. Considering that the Transformer computation increases quadratically with the number of tokens, this can often make Transformers intractable for larger images and longer videos. This leads to the question: is it really necessary to process that many tokens at every layer?

In “TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?”, an earlier version of which is presented at NeurIPS 2021, we show that adaptively generating a smaller number of tokens, rather than always relying on tokens formed by uniform splitting, enables Vision Transformers to run much faster and perform better. TokenLearner is a learnable module that takes an image-like tensor (i.e., input) and generates a small set of tokens. This module could be placed at various different locations within the model of interest, significantly reducing the number of tokens to be handled in all subsequent layers. The experiments demonstrate that having TokenLearner saves memory and computation by half or more without damaging classification performance, and because of its ability to adapt to inputs, it even increases the accuracy.

The TokenLearner
We implement TokenLearner using a straightforward spatial attention approach. In order to generate each learned token, we compute a spatial attention map highlighting regions-of-importance (using convolutional layers or MLPs). Such a spatial attention map is then applied to the input to weight each region differently (and discard unnecessary regions), and the result is spatially pooled to generate the final learned tokens. This is repeated multiple times in parallel, resulting in a few (~10) tokens out of the original input. This can also be viewed as performing a soft-selection of the pixels based on the weight values, followed by global average pooling. Note that the functions to compute the attention maps are governed by different sets of learnable parameters, and are trained in an end-to-end fashion. This allows the attention functions to be optimized in capturing different spatial information in the input. The figure below illustrates the process.

The TokenLearner module learns to generate a spatial attention map for each output token, and uses it to abstract the input to tokenize. In practice, multiple spatial attention functions are learned, are applied to the input, and generate different token vectors in parallel.

As a result, instead of processing fixed, uniformly tokenized inputs, TokenLearner enables models to process a smaller number of tokens that are relevant to the specific recognition task. That is, (1) we enable adaptive tokenization so that the tokens can be dynamically selected conditioned on the input, and (2) this effectively reduces the total number of tokens, greatly reducing the computation performed by the network. These dynamically and adaptively generated tokens can be used in standard transformer architectures such as ViT for images and ViViT for videos.

Where to Place TokenLearner
After building the TokenLearner module, we had to determine where to place it. We first tried placing it at different locations within the standard ViT architecture with 224×224 images. The number of tokens TokenLearner generated was 8 and 16, much less than 196 or 576 tokens the standard ViTs use. The below figure shows ImageNet few-shot classification accuracies and FLOPS of the models with TokenLearner inserted at various relative locations within ViT B/16, which is the base model with 12 attention layers operating on 16×16 patch tokens.

Top: ImageNet 5-shot transfer accuracy with JFT 300M pre-training, with respect to the relative TokenLearner locations within ViT B/16. Location 0 means TokenLearner is placed before any Transformer layer. Base is the original ViT B/16. Bottom: Computation, measured in terms of billions of floating point operations (GFLOPS), per relative TokenLearner location.

We found that inserting TokenLearner after the initial quarter of the network (at 1/4) achieves almost identical accuracies as the baseline, while reducing the computation to less than a third of the baseline. In addition, placing TokenLearner at the later layer (after 3/4 of the network) achieves even better performance compared to not using TokenLearner while performing faster, thanks to its adaptiveness. Due to the large difference between the number of tokens before and after TokenLearner (e.g., 196 before and 8 after), the relative computation of the transformers after the TokenLearner module becomes almost negligible.

Comparing Against ViTs
We compared the standard ViT models with TokenLearner against those without it while following the same setting on ImageNet few-shot transfer. TokenLearner was placed in the middle of each ViT model at various locations such as at 1/2 and at 3/4. The below figure shows the performance/computation trade-off of the models with and without TokenLearner.

Performance of various versions of ViT models with and without TokenLearner, on ImageNet classification. The models were pre-trained with JFT 300M. The closer a model is to the top-left of each graph the better, meaning that it runs faster and performs better. Observe how TokenLearner models perform better than ViT in terms of both accuracy and computation.

We also inserted TokenLearner within larger ViT models, and compared them against the giant ViT G/14 model. Here, we applied TokenLearner to ViT L/10 and L/8, which are the ViT models with 24 attention layers taking 10×10 (or 8×8) patches as initial tokens. The below figure shows that despite using many fewer parameters and less computation, TokenLearner performs comparably to the giant G/14 model with 48 layers.

Left: Classification accuracy of large-scale TokenLearner models compared to ViT G/14 on ImageNet datasets. Right: Comparison of the number of parameters and FLOPS.

High-Performing Video Models
Video understanding is one of the key challenges in computer vision, so we evaluated TokenLearner on multiple video classification datasets. This was done by adding TokenLearner into Video Vision Transformers (ViViT), which can be thought of as a spatio-temporal version of ViT. TokenLearner learned 8 (or 16) tokens per timestep.

When combined with ViViT, TokenLearner obtains state-of-the-art (SOTA) performance on multiple popular video benchmarks, including Kinetics-400, Kinetics-600, Charades, and AViD, outperforming the previous Transformer models on Kinetics-400 and Kinetics-600 as well as previous CNN models on Charades and AViD.

Models with TokenLearner outperform state-of-the-art on popular video benchmarks (captured from Nov. 2021). Left: popular video classification tasks. Right: comparison to ViViT models.
Visualization of the spatial attention maps in TokenLearner, over time. As the person is moving in the scene, TokenLearner pays attention to different spatial locations to tokenize.

Conclusion
While Vision Transformers serve as powerful models for computer vision, a large number of tokens and their associated computation amount have been a bottleneck for their application to larger images and longer videos. In this project, we illustrate that retaining such a large number of tokens and fully processing them over the entire set of layers is not necessary. Further, we demonstrate that by learning a module that extracts tokens adaptively based on the input image allows attaining even better performance while saving compute. The proposed TokenLearner was particularly effective in video representation learning tasks, which we confirmed with multiple public datasets. A preprint of our work as well as code are publicly available.

Acknowledgement
We thank our co-authors: AJ Piergiovanni, Mostafa Dehghani, and Anelia Angelova. We also thank the Robotics at Google team members for the motivating discussions.

Categories
Misc

Multi-worker training with Keras

when I’m using TensorFlow multi worker strategy does every node have to have every file or do I give certain files to each node?

submitted by /u/Mayfieldmobster
[visit reddit] [comments]

Categories
Misc

How to write binary classification with Tensorflow 1.x?

I’m at my wits end, every single tutorial for classification I find for Tensorflow 1.x is a goddamn MNIST tutorial, they always skip the basics, and I need the basics.

I have a test set with 6 numerical features and a label that is binary 1 or 0.

With Tensorflow 2.0 I can easily just use something like this

model = tf.keras.Sequential([ tf.keras.layers.Dense(21, activation="tanh"), tf.keras.layers.Dense(10, activation="sigmoid"), tf.keras.layers.Dense(1, activation="sigmoid") ]) model.compile( loss=tf.keras.losses.binary_crossentropy, optimizer=tf.keras.optimizers.Adam(0.001), metrics=['accuracy'] ) history = model.fit(X_train, y_train, epochs=25) 

For the life of me I don’t know how to go about doing this in Tensorflow 1.x.

submitted by /u/Practicing1s
[visit reddit] [comments]

Categories
Misc

NVIDIA Research Presenting 20 Papers at NeurIPS 2021

At the forefront of AI innovation, NVIDIA continues to push the boundaries of technology in machine learning, self-driving cars, robotics, graphics, and more.

At the forefront of AI innovation, NVIDIA continues to push the boundaries of technology in machine learning, self-driving cars, robotics, graphics, and more. NVIDIA researchers will present 20 papers at the thirty-fifth annual conference on Neural Information Processing Systems (NeurIPS) from December 6 to December 14, 2021. 

Here are some of the featured papers:

Alias-Free Generative Adversarial Networks (StyleGAN3)
Tero Karras, Miika Aittala, Samuli Laine, Erik Härkönen, Janne Hellsten, Jaakko Lehtinen, Timo Aila | Paper  | GitHub | Blog

StyleGAN3, a model developed by NVIDIA Research, will be presented on Tuesday, December 7 from 12:40 AM – 12:55 AM PST, advances the state-of-the-art in generative adversarial networks used to synthesize realistic images. The breakthrough brings graphics principles in signal processing and image processing to GANs to avoid aliasing: a kind of image corruption often visible when images are rotated, scaled or translated.

Video 1. Results from the StyleGAN3 model

EditGAN: High-Precision Semantic Image Editing
Huan Ling*, Karsten Kreis*, Daiqing Li, Seung Wook Kim, Antonio Torralba, Sanja Fidler | Paper | GitHub

EditGAN, a novel method for high quality, high precision semantic image editing, allowing users to edit images by modifying their highly detailed part segmentation masks, e.g., drawing a new mask for the headlight of a car. EditGAN builds on a GAN framework that jointly models images and their semantic segmentations, requiring only a handful of labeled examples, making it a scalable tool for editing. The poster session will be held on Thursday, December 9 from 8:30 AM – 10:00 AM PST.

Video 2. The video showcases EditGAN in an interactive demo tool.

SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers
Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, Ping Luo | Paper | GitHub

SegFormer, a simple, efficient yet powerful semantic segmentation framework which unifies Transformers with lightweight multilayer perception (MLP) decoders. SegFormer has two appealing features: 1) SegFormer comprises a novel hierarchically structured Transformer encoder which outputs multiscale features. It does not need positional encoding, thereby avoiding the interpolation of positional codes which leads to decreased performance when the testing resolution differs from training. 2) SegFormer avoids complex decoders. The poster will be presented on Tuesday, December 7 from 8:30 AM – 10:00 AM PST.



Video 3. The video shows the excellent zero-shot robustness of SegFormer on the Cityscapes-C dataset.

DIB-R++: Learning to Predict Lighting and Material with a Hybrid Differentiable Renderer
Wenzheng Chen, Joey Litalien, Jun Gao, Zian Wang, Clement Fuji Tsang, Sameh Khamis, Or Litany, Sanja Fidler | Paper

DIB-R++, a deferred, image-based renderer which supports these photorealistic effects by combining rasterization and ray-tracing, taking advantage of their respective strengths—speed and realism. The poster session is on Thursday, December 9 from 4:30 PM – 6:00 PM PST.

DIB-R++ is a deferred, image-based renderer to predict lighting and material.
Image 1. DIB-R++ is a hybrid renderer that combines rasterization and ray tracing together. Given a 3D mesh M, we employ (a) a rasterization-based renderer to obtain diffuse albedo, surface normals and mask maps. In the shading pass (b), we then use these buffers to compute the incident radiance by sampling or by representing lighting and the specular BRDF using a spherical Gaussian basis. Depending on the representation used in (c), we can render with advanced lighting and material effect (d).

In addition to the papers at NeurIPS 2021, researchers and developers can accelerate 3D deep learning research with new Kaolin features:

Kaolin is launching new features to accelerate 3D deep learning research. Updates to the NVIDIA Omniverse Kaolin app will bring robust visualization of massive point clouds. Updates to the Kaolin library will include support for tetrahedral meshes, rays management functionality, and a strong speedup to DIB-R. To learn more about Kaolin, watch the recent GTC session.

Kaolin is launching new features to accelerate 3D deep learning research.
Image 2. Results from NVIDIA Kaolin

To view the complete list of NVIDIA Research accepted papers, workshop and tutorials, demos, and to explore job opportunities at NVIDIA, visit the NVIDIA at NeurIPS 2021 website.

Categories
Misc

Inference time on cpu is high

hi, i tried to load model on cpu with tf.device while inference of 500 images , the cpu usage resches to 100% , inference time is 0.6sec and how do I minimize the inference time and also the utilization of cpu .

submitted by /u/nanitiru18
[visit reddit] [comments]

Categories
Misc

What are the nvidia NGC container optimizations for mixed precision?

I am trying to increase the training speed of my model by using mixed precision and the nvidia gpu tensor cores. For this, I just use the keras mixed precision, but the speed increment is only of 10%. Then I found the nividia ngc container, which is optimized for their gpus, and with mixed precision I can increase the training speed a 60%, although with float32 the speed in lower than native. I would like to have at least the speed increase of ngc container natively, what do I need to do?

submitted by /u/diepala
[visit reddit] [comments]