Categories
Misc

Preparing Time Series Data for LSTMs

I have no formal education here but my understanding is that RNNs take an input window and “unfold it”, basing each prediction in part on those prior. Say I have a batch size of 1: There shouldn’t be a relationship between the first batch and second, correct? (if not, tell me; the rest is irrelevant)

Does it follow from my understanding that it’s safe to

  • Have overlapping windows in my data? (so conceptually, have batches 0, 1, 2 = data[0:4, 1:5, 2: 6])
  • Split into fit/val sets derived from random choices of windows? (rather than just slicing twice)
  • Shuffle data after windowing?

submitted by /u/EX3000
[visit reddit] [comments]

Categories
Misc

How did the Keras Syntax changed over the time?

Hello,

I recently stumbled over a video tutorial with code examples. I tried to figure out what happens in the code, and did some research in the keras developer guides. I don’t have much experience with coding, but I think the syntax, which is used in my Code example is completely different from the syntax in the official keras docs…

My question would be if there was a significant change in the last years of writing python code with keras? And if the answer is yes, does it still make sense to work with the older syntax?

submitted by /u/LeiseLeo
[visit reddit] [comments]

Categories
Offsites

TRILLsson: Small, Universal Speech Representations for Paralinguistic Tasks

In recent years, we have seen dramatic improvements on lexical tasks such as automatic speech recognition (ASR). However, machine systems still struggle to understand paralinguistic aspects — such as tone, emotion, whether a speaker is wearing a mask, etc. Understanding these aspects represents one of the remaining difficult problems in machine hearing. In addition, state-of-the-art results often come from ultra-large models trained on private data, making them impractical to run on mobile devices or to release publicly.

In “Universal Paralinguistic Speech Representations Using Self-Supervised Conformers”, to appear in ICASSP 2022, we introduce CAP12— the 12th layer of a 600M parameter model trained on the YT-U training dataset using self-supervision. We demonstrate that the CAP12 model outperforms nearly all previous results in our paralinguistic benchmark, sometimes by large margins, even though previous results are often task-specific. In “TRILLsson: Distilled Universal Paralinguistic Speech Representations”, we introduce the small, performant, publicly-available TRILLsson models and demonstrate how we reduced the size of the high-performing CAP12 model by 6x-100x while maintaining 90-96% of the performance. To create TRILLsson, we apply knowledge distillation on appropriately-sized audio chunks and use different architecture types to train smaller, faster networks that are small enough to run on mobile devices.

1M-Hour Dataset to Train Ultra-Large Self-Supervised Models
We leverage the YT-U training dataset to train the ultra-large, self-supervised CAP12 model. The YT-U dataset is a highly varied, 900M+ hour dataset that contains audio of various topics, background conditions, and speaker acoustic properties.

Video categories by length (outer) and number (inner), demonstrating the variety in the YT-U dataset (figure from BigSSL)

We then modify a Wav2Vec 2.0 self-supervised training paradigm, which can solve tasks using raw data without labels, and combine it with ultra-large Conformer models. Because self-training doesn’t require labels, we can take full advantage of YT-U by scaling up our models to some of the largest model sizes ever trained, including 600M, 1B, and 8B parameters.

NOSS: A Benchmark for Paralinguistic Tasks
We demonstrate that an intermediate representation of one of the previous models contains a state-of-the-art representation for paralinguistic speech. We call the 600M parameter Conformer model without relative attention Conformer Applied to Paralinguistics (CAP). We exhaustively search through all intermediate representations of six ultra-large models and find that layer 12 (CAP12) outperforms previous representations by significant margins.

To measure the quality of the roughly 300 candidate paralinguistic speech representations, we evaluate on an expanded version of the NOn-Semantic Speech (NOSS) benchmark, which is a collection of well-studied paralinguistic speech tasks, such as speech emotion recognition, language identification, and speaker identification. These tasks focus on paralinguistics aspects of speech, which require evaluating speech features on the order of 1 second or longer, rather than lexical features, which require 100ms or shorter. We then add to the benchmark a mask-wearing task introduced at Interspeech 2020, a fake speech detection task (ASVSpoof 2019), a task to detect the level of dysarthria from project Euphonia, and an additional speech emotion recognition task (IEMOCAP). By expanding the benchmark and increasing the diversity of the tasks, we empirically demonstrate that CAP12 is even more generally useful than previous representations.

Simple linear models on time-averaged CAP12 representations even outperform complex, task-specific models on five out of eight paralinguistic tasks. This is surprising because comparable models sometimes use additional modalities (e.g., vision and speech, or text and speech) as well. Furthermore, CAP12 is exceptionally good at emotion recognition tasks. CAP12 embeddings also outperform all other embeddings on all other tasks with only a single exception: for one embedding from a supervised network on the dysarthria detection task.

Model Voxceleb   Voxforge   Speech Commands   ASVSpoof2019∗∗   Euphonia#   CREMA-D   IEMOCAP
Prev SoTA 95.4 97.9 5.11 45.9 74.0 67.6+
TRILL 12.6 84.5 77.6 74.6 48.1 65.7 54.3
ASR Embedding 5.2 98.9 96.1 11.2 54.5 71.8 65.4
Wav2Vec2 layer 6†† 17.9 98.5 95.0 6.7 48.2 77.4 65.8
CAP12 51.0 99.7 97.0 2.5 51.5 88.2 75.0
Test performance on the NOSS Benchmark and extended tasks. “Prev SoTA” indicates the previous best performing state-of-the-art model, which has arbitrary complexity, but all other rows are linear models on time-averaged input. Filtered according to YouTube’s privacy guidelines. ∗∗ Uses equal error rate [20]. # The only non-public dataset. We exclude it from aggregate scores. Audio and visual features used in previous state-of-the-art models. + The previous state-of-the-art model performed cross-validation. For our evaluation, we hold out two specific speakers as a test. †† Wav2Vec 2.0 model from HuggingFace. Best overall layer was layer 6.

TRILLsson: Small, High Quality, Publicly Available Models
Similar to FRILL, our next step was to make an on-device, publicly available version of CAP12. This involved using knowledge distillation to train smaller, faster, mobile-friendly architectures. We experimented with EfficientNet, Audio Spectrogram Transformer (AST), and ResNet. These model types are very different, and cover both fixed-length and arbitrary-length inputs. EfficientNet comes from a neural architecture search over vision models to find simultaneously performant and efficient model structures. AST models are transformers adapted to audio inputs. ResNet is a standard architecture that has shown good performance across many different models.

We trained models that performed on average 90-96% as well as CAP12, despite being 1%-15% the size and trained using only 6% the data. Interestingly, we found that different architecture types performed better at different sizes. ResNet models performed best at the low end, EfficientNet in the middle, and AST models at the larger end.

Aggregate embedding performance vs. model size for various student model architectures and sizes. We demonstrate that ResNet architectures perform best for small sizes, EfficientNetV2 performs best in the midsize model range, up to the largest model size tested, after which the larger AST models are best.

We perform knowledge distillation with the goal of matching a student, with a fixed-size input, to the output of a teacher, with a variable-size input, for which there are two methods of generating student targets: global matching and local matching. Global matching produces distillation targets by generating CAP12 embeddings for an entire audio clip, and then requires that a student match the target from just a small segment of audio (e.g., 2 seconds). Local matching requires that the student network match the average CAP12 embedding just over the smaller portion of the audio that the student sees. In our work, we focused on local matching.

Two types of generating distillation targets for sequences. Left: Global matching uses the average CAP12 embedding over the whole clip for the target for each local chunk. Right: Local matching uses CAP12 embeddings averaged just over local clips as the distillation target.

Observation of Bimodality and Future Directions
Paralinguistic information shows an unexpected bimodal distribution. For the CAP model that operates on 500 ms input segments, and two of the full-input Conformer models, intermediate representations gradually increase in paralinguistic information, then decrease, then increase again, and finally lose this information towards the output layer. Surprisingly, this pattern is also seen when exploring the intermediate representations of networks trained on retinal images.

500 ms inputs to CAP show a relatively pronounced bimodal distribution of paralinguistic information across layers.
Two of the conformer models with full inputs show a bimodal distribution of paralinguistic information across layers.

We hope that smaller, faster models for paralinguistic speech unlock new applications in speech recognition, text-to-speech generation, and understanding user intent. We also expect that smaller models will be more easily interpretable, which will allow researchers to understand what aspects of speech are important for paralinguistics. Finally, we hope that our open-sourced speech representations are used by the community to improve paralinguistic speech tasks and user understanding in private or small datasets.

Acknowledgements
I’d like to thank my co-authors Aren Jansen, Wei Han, Daniel Park, Yu Zhang, and Subhashini Venugopalan for their hard work and creativity on this project. I’d also like to thank the members of the large collaboration for the BigSSL work, without which these projects would not be possible. The team includes James Qin, Anmol Gulati, Yuanzhong Xu, Yanping Huang, Shibo Wang, Zongwei Zhou, Bo Li, Min Ma, William Chan, Jiahui Yu, Yongqiang Wang, Liangliang Cao, Khe Chai Sim, Bhuvana Ramabhadran, Tara N. Sainath, Françoise Beaufays, Zhifeng Chen, Quoc V. Le, Chung-Cheng Chiu, Ruoming Pang, and Yonghui Wu.

Categories
Misc

Annoying behavior of developers not keeping reverse compatibility

Hi, I’m a new student trying to learn Tensorflow. I have already found some really good books and tutorials which helps me learn this fast. But when I started trying out the examples given I soon realized that all the good tutorials that I have are discussing about Tensorflow 1.15 and amazingly these codes will not work with tensorflow 2.0+.

I really find this cool and amazing behavior from the devolpers who have zero concern about reverse-compatibility. I can go to Google and fix the old code line by line, replacing each with the Tensorflow2 equivalent.

But since I’m a beginner, this is a nightmare for me. Can anyone explain to me in simple terms why these douchebags does not maintain reverse compatibility when these airheads update these libraries?

I really want to find the developers who did this and dip their face in boiling oil.

P.s: when is Tensorflow3 coming out? I’m now trying to learn Tensorflow2, i assume that Tensorflow3 will be completely different from Tensorflow2 and we would have to re-learn everything from scratch in that too.

submitted by /u/Dgreenfox
[visit reddit] [comments]

Categories
Misc

Latest Releases and Resources: March 3-9

Register for the Game Developer Conference; join DRIVE Developer Days; get DLI training at GTC; learn how Metropolis can grow your vision AI business; meet the Shell.AI Hackathon winners.

Our weekly roundup covers the most recent software updates, learning resources, events, and notable news. 



Events

NVIDIA at GDC: Advancing Innovations for Virtual Worlds in Game Development

At the Game Developer Conference, attendees will experience how the latest NVIDIA-powered solutions are enabling developers to create more realistic, immersive virtual worlds for players.

Register online: NVIDIA at GDC


Learning resources

Accelerate Autonomous Vehicle Development with DRIVE Developer Days at GTC

NVIDIA DRIVE Developer Days are March 22-23 and feature deep-dive sessions on safe and robust autonomous vehicle development. This special event showcases the latest innovations in autonomous driving and software-defined vehicle architectures. Special sessions led by the NVIDIA engineering team highlight the newest DRIVE solutions. Attendees will learn how to apply these technologies to their own autonomous vehicle development and have the opportunity to chat with engineers.

This virtual content is available to all GTC attendees and will be available on demand after the event.

Register online: DRIVE Developer Days

Hands-On DLI Training Labs Available at GTC 

Choose from 24 training labs taught by technical experts covering HPC, networking, deep learning, data science, conversational AI, computer vision, and more.

Register online: DLI Training Labs at GTC

Learn How Metropolis Can Boost Your Go-to-Market Efforts​

Tune in to this meetup replay and find out how the Metropolis program can grow your vision AI business and enhance go-to-market efforts​.

Learn how:

  • Metropolis Validation Labs optimize your applications and accelerate deployments.
  • NVIDIA Fleet Command simplifies provisioning and management of edge deployments accelerating the time to scale from POC to production.
  • NVIDIA Launchpad provides easy access to GPU instances for faster POCs and customer trial

Get started: Learn How Metropolis Can Boost Your Go-to-Market Efforts​ 


News

Meet the Winners of the Shell.AI Hackathon for Renewable and Sustainable Energy

Tackling climate change is an urgent challenge worldwide. Developing and delivering renewable energy sources is critical. To explore new opportunities for low-carbon energy, Shell collaborated with NVIDIA, OpenACC, and Microsoft, with support from SINE and NIRMAAN to launch the AI Solar Power Prediction Challenge. 

In this multi-stage hackathon, participants used historical sky camera images and weather data to predict cloud coverage cast over solar farms in Stage 1 and predict Global Horizontal Irradiance for any two-hour interval of the day. More than 2000 participants across 50+ countries rose to the challenge and over 6200 submissions were received. Selected Hackathon winners provided inspired solutions that were accurate, innovative and scalable, putting their passion to purpose to advance renewable and sustainable energy.

Meet the winners: Shell.AI Hackathon for Renewable and Sustainable Energy

Categories
Misc

GFN Thursday Marches Forward With 21 Games Coming to GeForce NOW This Month

A new month means a whole new set of games coming to GeForce NOW. Members can look forward to 21 titles joining the GeForce NOW library in March, including day-and-date releases like Shadow Warrior 3, with support for NVIDIA DLSS. Bring a Katana to a Gunfight Shoot, slash and slide into Shadow Warrior 3, new Read article >

The post GFN Thursday Marches Forward With 21 Games Coming to GeForce NOW This Month appeared first on NVIDIA Blog.

Categories
Misc

Im i doing this wrong? need to know so i dont waste 1h+ hrs

Im i doing this wrong? need to know so i dont waste 1h+ hrs

submitted by /u/SpencyDotRed
[visit reddit] [comments]

Categories
Misc

Reason for receiving KeyError: ‘[…] not in index’ while using tensorflow.compat.v1 to train and build a standard DNN model?

submitted by /u/professorDissociate
[visit reddit] [comments]

Categories
Misc

Tensorflow Lite on RPI

I am trying to run a tensorflow lite model in my RPI (3B+) and is following this blog post that I found on this sub

https://blog.paperspace.com/tensorflow-lite-raspberry-pi/

The article is fairly old so I have the lastest version of python and tflite-runtime compared the mentioned versions on the blog.

The blog used a MobileNetV1 while I used my own model which made use of MobileNetV2.

I copied the code for now and changed the directories to match my local.

The code does not have error or warnings but I always get the first item on my text file, and an accuracy of 0.0%.

What could be the issue here?

submitted by /u/clareeenceee
[visit reddit] [comments]

Categories
Misc

Obtaining information from text

Hello, I am a newbie and I have done some research before asking my question which leads me to confusion about what to use for my case. Simple definiton of my problem is, I have an input which consists of 5-6 sentences at max. In these sentences I have to obtain values of some terms. For example, sometimes the input is:

-” … Pg = 250 kN …”

-” …. dead load is 250 kN …”

-” …dead load on the system is given as 250 kN.” .

And this is not the only term to obtain, I have some more terms which could be symbolised with more than one way. I have read about the named entity recognition, rule based matcher, bi-directional lstm etc. but now I am lost and don’t know what to do and which method to use. I need someone to show me the correct way , and after that I can work on that topic and solve this problem. Any help much appreciated.

submitted by /u/freeman0694
[visit reddit] [comments]