Categories
Misc

TextBoxGan: First GAN generating text boxes! Github repo (implemented with TF2): https://github.com/NoAchache/TextBoxGan, with all the technical and theoretical documentation, as well as a pre-trained model.

TextBoxGan: First GAN generating text boxes! Github repo (implemented with TF2): https://github.com/NoAchache/TextBoxGan, with all the technical and theoretical documentation, as well as a pre-trained model. submitted by /u/Noe_Achache
[visit reddit] [comments]
Categories
Misc

Tensorflow 2 code for Attention Mechanisms chapter of Dive into Deep Learning (D2L) book.

Tensorflow 2 code for Attention Mechanisms chapter of Dive into Deep Learning (D2L) book. submitted by /u/biswajitsahoo1111
[visit reddit] [comments]
Categories
Misc

LSTM Prediction, Coca Cola stock closing price $51.35 3/19/21

TL/DR: Teaching myself about investing and Tensorflow. My model predicts Coca Cola will close $51.35 on 3/19/2021. I have no idea what I’m doing. Do not use this to make financial decisions.

So last week I predicted Coca Cola would close at $42.56. It really closed at $50.36. I would have been better off using the lazy predictor and just assume last week’s close of $50.79 would not have moved. See my previous post:

https://www.reddit.com/r/tensorflow/comments/lzv277/silly_prediction_coca_cola_stock_closing_price/

So the first thing I did was scale my numbers. Having price points in the tens of dollars, and volume in the millions blew up my model pretty badly. My previous unscaled version had errors relative to my training data in the 10’s of dollars per week.

I scaled volume to shares outstanding, and share price to % change, and it looks like this model is performing better. The scaled model has errors much tighter. I’ve attached a histogram of my model’s error using the last 100 week’s of data. Next Friday will be the first real test though.

So the bad news. I define success as my model performing better than a “Lazy” predictor, which assumes that next week’s closing price will equal this week’s closing price. My model’s error histogram pretty much sits exactly on top of the Lazy predictor histogram.

Comparing root mean square error, my LSTM predictor is actually slightly worse…

There are a few known issues that I need to work out. I haven’t included dividend data, nor do I have a method to handle splits. I’ll need to be able to address splits if I want to include other companies.

Data is starting to get a bit messy to handle. I’m currently stuck doing some manual manipulation and manual data entry, as Alphavantage is missing historical shares outstanding. Ideally I would include float as well.

Looks like I might need to setup a SQL server locally to host all this data.

I’m starting to understand why many of the tutorials spend 80% of their time on data manipulation and only 20% on tensorflow.

submitted by /u/squirrelaway4all
[visit reddit] [comments]

Categories
Misc

MIT Researchers Use Deep Learning to Develop Real-Time 3D Holograms

Computer-generated holograms powered by deep learning could make real-time 3D holography feasible on laptops and smartphones, an advancement with potential applications in fields including virtual reality, microscopy, and 3D printing.

Computer-generated holograms powered by deep learning could make real-time 3D holography feasible on laptops and smartphones, an advancement with potential applications in fields including virtual reality, microscopy, and 3D printing.

Published this week in Nature, an MIT study outlines a novel approach called tensor holography, where researchers trained and optimized a convolutional neural network to create holograms from images with depth information. The compact network requires under 1 MB of memory, and crafts holograms within milliseconds. 

“People previously thought that with existing consumer-grade hardware, it was impossible to do real-time 3D holography computations,” said lead author Liang Shi, Ph.D. student in MIT’s Department of Electrical Engineering and Computer Science. “It’s often been said that commercially available holographic displays will be around in 10 years, yet this statement has been around for decades.”

Old-school holograms use laser beams to depict a static scene with both colors and a sense of depth. Traditionally, computer-generated holography has relied on supercomputers to simulate this optical set up, making it possible to create digital holograms that can also capture motion and be easily reproduced and shared.  

Simulating the underlying physics of a hologram, however, is computationally intensive, taking up to minutes to render a single holographic image on a clustered supercomputer.  

“Because each point in the scene has a different depth, you can’t apply the same operations for all of them,” Shi said. “That increases the complexity significantly.”

To speed things up, and increase the photorealistic precision of the holograms, the researchers turned to deep learning. 

The team created custom high-quality training data — the first such database for 3D holograms — made up of 4,000 images with depth information, and a corresponding 3D hologram for each image. Training on NVIDIA Tensor Core GPUs, the CNN learned to generate accurate holograms from images with depth information, which can be acquired by standard modern smartphones with multi-camera setups or LiDAR sensors. 

Using an NVIDIA TITAN RTX GPU and the NVIDIA TensorRT SDK for inference, the optimized neural network runs in real time, achieving a speedup of more than two orders of magnitude compared to physical simulation. The final model uses just 617 kilobytes of memory, allowing it to run interactively on low-power AI chips on mobile and edge devices. 

The resulting method not only accelerated the process, but also produced holograms with accurate occlusion and per-pixel focal control, improving the images’ realism. 

Read the news release from MIT and visit the researchers’ project page. The full paper can be found in Nature.

Categories
Offsites

LEAF: A Learnable Frontend for Audio Classification

Developing machine learning (ML) models for audio understanding has seen tremendous progress over the past several years. Leveraging the ability to learn parameters from data, the field has progressively shifted from composite, handcrafted systems to today’s deep neural classifiers that are used to recognize speech, understand music, or classify animal vocalizations such as bird calls. However, unlike computer vision models, which can learn from raw pixels, deep neural networks for audio classification are rarely trained from raw audio waveforms. Instead, they rely on pre-processed data in the form of mel filterbanks — handcrafted mel-scaled spectrograms that have been designed to replicate some aspects of the human auditory response.

Although modeling mel filterbanks for ML tasks has been historically successful, it is limited by the inherent biases of fixed features: even though using a fixed mel-scale and a logarithmic compression works well in general, we have no guarantee that they provide the best representations for the task at hand. In particular, even though matching human perception provides good inductive biases for some application domains, e.g., speech recognition or music understanding, these biases may be detrimental to domains for which imitating the human ear is not important, such as recognizing whale calls. So, in order to achieve optimal performance, the mel filterbanks should be tailored to the task of interest, a tedious process that requires an iterative effort informed by expert domain knowledge. As a consequence, standard mel filterbanks are used for most audio classification tasks in practice, even though they are suboptimal. In addition, while researchers have proposed ML systems to address these problems, such as Time-Domain Filterbanks, SincNet and Wavegram, they have yet to match the performance of traditional mel filterbanks.

In “LEAF, A Fully Learnable Frontend for Audio Classification”, accepted at ICLR 2021, we present an alternative method for crafting learnable spectrograms for audio understanding tasks. LEarnable Audio Frontend (LEAF) is a neural network that can be initialized to approximate mel filterbanks, and then be trained jointly with any audio classifier to adapt to the task at hand, while only adding a handful of parameters to the full model. We show that over a wide range of audio signals and classification tasks, including speech, music and bird songs, LEAF spectrograms improve classification performance over fixed mel filterbanks and over previously proposed learnable systems. We have implemented the code in TensorFlow 2 and released it to the community through our GitHub repository.

Mel Filterbanks: Mimicking Human Perception of Sound
The first step in the traditional approach to creating a mel filterbank is to capture the sound’s time-variability by windowing, i.e., cutting the signal into short segments with fixed duration. Then, one performs filtering, by passing the windowed segments through a bank of fixed frequency filters, that replicate the human logarithmic sensitivity to pitch. Because we are more sensitive to variations in low frequencies than high frequencies, mel filterbanks give more importance to the low-frequency range of sounds. Finally, the audio signal is compressed to mimic the ear’s logarithmic sensitivity to loudness — a sound needs to double its power for a person to perceive an increase of 3 decibels.

LEAF loosely follows this traditional approach to mel filterbank generation, but replaces each of the fixed operations (i.e., the filtering layer, windowing layer, and compression function) by a learned counterpart. The output of LEAF is a time-frequency representation (a spectrogram) similar to mel filterbanks, but fully learnable. So, for example, while a mel filterbank uses a fixed scale for pitch, LEAF learns the scale that is best suited to the task of interest. Any model that can be trained using mel filterbanks as input features, can also be trained on LEAF spectrograms.

Diagram of computation of mel filterbanks compared to LEAF spectrograms.

While LEAF can be initialized randomly, it can also be initialized in a way that approximates mel filterbanks, which have been shown to be a better starting point. Then, LEAF can be trained with any classifier to adapt to the task of interest.

Left: Mel filterbanks for a person saying “wow”. Right: LEAF’s output for the same example, after training on a dataset of speech commands.

A Parameter-Efficient Alternative to Fixed Features
A potential downside of replacing fixed features that involve no learnable parameter with a trainable system is that it can significantly increase the number of parameters to optimize. To avoid this issue, LEAF uses Gabor convolution layers that have only two parameters per filter, instead of the ~400 parameters typical of a standard convolution layer. This way, even when paired with a small classifier, such as EfficientNetB0, the LEAF model only accounts for 0.01% of the total parameters.

Top: Unconstrained convolutional filters after training for audio event classification. Bottom: LEAF filters at convergence after training for the same task.

Performance
We apply LEAF to diverse audio classification tasks, including recognizing speech commands, speaker identification, acoustic scene recognition, identifying musical instruments, and finding birdsongs. On average, LEAF outperforms both mel filterbanks and previous learnable frontends, such as Time-Domain Filterbanks, SincNet and Wavegram. In particular, LEAF achieves a 76.9% average accuracy across the different tasks, compared to 73.9% for mel filterbanks. Moreover we show that LEAF can be trained in a multi-task setting, such that a single LEAF parametrization can work well across all these tasks. Finally, when combined with a large audio classifier, LEAF reaches state-of-the-art performance on the challenging AudioSet benchmark, with a 2.74 d-prime score.

D-prime score (the higher the better) of LEAF, mel filterbanks and previously proposed learnable spectrograms on the evaluation set of AudioSet.

Conclusion
The scope of audio understanding tasks keeps growing, from diagnosing dementia from speech to detecting humpback whale calls from underwater microphones. Adapting mel filterbanks to every new task can require a significant amount of hand-tuning and experimentation. In this context, LEAF provides a drop-in replacement for these fixed features, that can be trained to adapt to the task of interest, with minimal task-specific adjustments. Thus, we believe that LEAF can accelerate development of models for new audio understanding tasks.

Acknowledgements
We thank our co-authors, Olivier Teboul, Félix de Chaumont-Quitry and Marco Tagliasacchi. We also thank Dick Lyon, Vincent Lostanlen, Matt Harvey, and Alex Park for helpful discussions, and Julie Thomas for helping to design figures for this post.

Categories
Misc

Support for CUDA Unified Memory Now Available in Thrust

Thrust 1.12.0 and CUB 1.12.0 are distributed with the NVIDIA HPC SDK 21.3 and the CUDA Toolkit 11.4.

Thrust 1.12.0 is a major release providing bug fixes and performance enhancements. It includes a new thrust::universal_vector which holds data that is accessible from both host and device. This enables the use of CUDA unified memory with Thrust. Also added are new asynchronous versions of thrust::async:exclusive_scan and inclusive_scan algorithms. The synchronous versions of these have been updated to use cub::DeviceScan directly. This release deprecates support for Clang

CUB 1.12.0 is a major release providing bug fixes and performance enhancements. It includes improved Radix sort stability.  Please see the CUB 1.12 Release Notes for more information.

Both packages are available today from GitHub.  They are also distributed with the NVIDIA HPC SDK 21.3 and the CUDA Toolkit 11.4.

About Thrust and CUB

Thrust is a modern C++ parallel algorithms library which provides a std::-like interface. Thrust abstractions are agnostic of any particular parallel programming model or hardware. With Thrust, you can write code once and run it in parallel on either your CPU or GPU. CUB is a C++ library of collective primitives and utilities for parallel algorithm authors. CUB is specific to CUDA C++ and its interfaces explicitly accommodate CUDA-specific features.

Thrust and CUB are complementary and are often used together. 

Learn more:

Categories
Misc

On Demand Webinar: Limitless Capabilities of NVIDIA CloudXR 2.0

Learn how NVIDIA CloudXR can be used to deliver limitless virtual and augmented reality over networks (including 5G) to low cost, low-powered headsets and devices

The recent webinar shares how NVIDIA CloudXR can be used to deliver limitless virtual and augmented reality over networks (including 5G) to low cost, low-powered headsets and devices — while maintaining the high-quality experience traditionally reserved for high-end headsets that are plugged into high-performance computers.

At the end of this webinar, we hosted a Q&A session with our guest speakers from The Grid Factory, co-founder and CTO Ben Jones and Applications Specialist Tom Murray. The Grid Factory is a UK-based immersive technology integrator.

Below are the answers to the top questions asked during the webinar.

Q: Does CloudXR stream the data at the native resolution and refresh rate of the headset? If so, how well does it handle high refresh rates?

The Grid Factory: Yes, CloudXR delivers data at the resolution of the HMD. CloudXR has been tested on the Valve Index at 144Hz, and as long as the environment is designed appropriately, the experience is as good as a local one.

Q: What is the size of the GPU cluster recommended for CloudXR to support multiple users?

The Grid Factory: The size of the cluster will depend on how many CloudXR sessions a single GPU can support. For virtual reality, if using the RTX 6000, RTX 8000 or NVIDIA A40, then up to two concurrent CloudXR sessions may be supported, depending on the application requirements.

If using augmented reality, then this number may increase due to the change in application requirements and client optics. To be able to support multiple users on the same GPU, the environment must be running NVIDIA virtual GPU software, which allows the virtualization of GPU resources.

Q: Are there any CAD packages supported (Creo, SOLIDWORKS, NX, Inventor etc)?

The Grid Factory: We have tested CloudXR with Autodesk, Siemens and SOLIDWORKS products with great success. CloudXR does work with these products and other OpenVR design applications.

Q: How would the CloudXR solution paired with Oculus Quest 2 compare to a tethered VR experience (such as a Valve Index) in terms of latency?

The Grid Factory: If the Wi-Fi 6 network is performing correctly, there should be no discernible difference other than those specified by the manufacturers in terms of hardware specifications. However, due to the Quest 2 now being able to access the same hardware resources (such as the CPU and GPU) as the Valve Index, the performance differences should be minimal, with the Quest 2 having the benefit of better mobility over the Valve Index or any other tethered HMD.

Q: Do you need a 5G-capable device to use the VR?

The Grid Factory: No, currently there aren’t any 5G-capable XR headsets on general sale (except perhaps the NReal, but this capability is only available when connected to a 5G phone; that is a separate purchase to the headset, and is only available in South Korea and Japan).

CloudXR will work across Wi-Fi 5 networks, but there are performance considerations. The minimum bandwidth required for a CloudXR experience in a headset is 50mbp/s, but we would suggest that a larger bandwidth than this is necessary, and this doesn’t take into consideration other devices that may be accessing the internet through the Wi-Fi network.

Q: What computing power are we talking about on the AWS side?

The Grid Factory: The P instances (V100), and the G instances (T4) have been tested with CloudXR.

Q: Does the client need a GPU on the client side?

The Grid Factory: If you have a tethered headset such as a Vive, then yes, although it doesn’t need to be very powerful as it needs to decode the stream from the CloudXR server.

If you are in a mobile headset such as Quest 2 or Pico Neo 2, then you do not need a GPU. The point of CloudXR is that all the GPU rendering and encoding is done on the server side, rather than the client side. This means that the experience is standard across a wide variety of devices. And because of this, the battery life of all-in-one or all-in-two headsets is improved when users are in the application, as the grunt work is being done outside the headset. 

To learn more, visit the CloudXR page where there are plenty videos, blog posts, webinars, and more to help you get started. 

And don’t miss out on the latest AR and VR news at the GPU Technology Conference (GTC), which starts April 12, 2021. Register now for free access and check out the keynote by NVIDIA CEO and founder Jensen Huang, and other XR sessions available at GTC.

Categories
Misc

‘GDC Showcase’ Highlights Top NVIDIA Technologies

From March 15-19, GDC Showcase will introduce a wide range of new content for game developers to explore. NVIDIA will be there with a new talk, covering how to best harness the power NVIDIA RTX GPUs with a suite of SDKs and tools custom built for the job.

Learn About New SDKS Built For Real-Time Ray Tracing and More

From March 15-19, GDC Showcase will introduce a wide range of new content for game developers to explore. This digital event is free-to-attend.

NVIDIA will be there with a new talk, Next-Generation Game Development on NVIDIA RTX GPUs (Monday, March 15, 1:00pm). Learn how to harness the power NVIDIA RTX GPUs with a suite of SDKs and tools custom built for the job. John Spitzer – VP of Developer and Performance Technology – will provide insight into how to get the most out of real-time ray tracing through technologies like Deep Learning Super Sampling (DLSS), NVIDIA Real-time Denoisers (NRD), RTX Global Illumination (RTXGI), and RTX Direct Illumination (RTXDI)

Working on a competitive game? John’s overview of NVIDIA Reflex SDK will explain how to reduce latency in your games. Struggling with debugging? The Nsight Graphics portion of the talk will help you out. The session will close with tips on how to future-proof your development pipeline with forward facing technology like path tracing and universal scene description (USD) files, and the NVIDIA Omniverse Platform

Our aim is to help game developers learn how to use the right tools and SDKS to get the most out of NVIDA’s RTX GPUs. Attendees will receive an overview on the technologies that drive the bleeding edge of PC game development, while learning how to future-proof their workflow. 

This talk is intended for a general audience interested in learning more about next-generation PC game technologies. We will also have a full track for game developers at our upcoming GTC this April.

Categories
Misc

Innovators, Researchers, Industry Leaders: Meet the Women Headlining at GTC

Hundreds of women speakers will present research and insights across industries at the upcoming GPU Technology Conference.

Hundreds of women speakers will present research and insights across industries at the upcoming GPU Technology Conference.

Categories
Misc

NVIDIA CEO Jensen Huang to Host AI Pioneers Yoshua Bengio, Geoffrey Hinton and Yann LeCun, and Others, at GTC21

CEO and founder Jensen Huang will host renowned AI pioneers Yoshua Bengio, Geoffrey Hinton and Yann LeCun at the company’s upcoming technology conference.