Categories
Misc

L1 and L2 norms for 4-D Conv layer tensor

(TensorFlow 2.4.1 and np 1.19.2) – For a defined convolutional layer as follows:

conv = Conv2D( filters = 3, kernel_size = (3, 3), activation='relu', kernel_initializer = tf.initializers.GlorotNormal(), bias_initializer = tf.ones_initializer, strides = (1, 1), padding = 'same', data_format = 'channels_last' ) # and a sample input data- x = tf.random.normal(shape = (1, 5, 5, 3), mean = 1.0, stddev = 0.5) x.shape # TensorShape([1, 5, 5, 3]) # Get output from the conv layer- out = conv(x) out.shape # TensorShape([1, 5, 5, 3]) out = tf.squeeze(out) out.shape # TensorShape([5, 5, 3]) 

Here, the three filters can be accessed as: conv.weights[0][:, :, :, 0], conv.weights[0][:, :, :, 1] and conv.weights[0][:, :, :, 2] respectively.

If I want to compute the L2 norms for all of the three filters/kernels, I am using the code:

# Compute L2 norms- # Using numpy- np.linalg.norm(conv.weights[0][:, :, :, 0], ord = None) # 0.85089666 # Using tensorflow- tf.norm(conv.weights[0][:, :, :, 0], ord = 'euclidean').numpy() # 0.85089666 # Using numpy- np.linalg.norm(conv.weights[0][:, :, :, 1], ord = None) # 1.0733316 # Using tensorflow- tf.norm(conv.weights[0][:, :, :, 1], ord = 'euclidean').numpy() # 1.0733316 # Using numpy- np.linalg.norm(conv.weights[0][:, :, :, 2], ord = None) # 1.0259292 # Using tensorflow- tf.norm(conv.weights[0][:, :, :, 2], ord = 'euclidean').numpy() # 1.0259292 

How can I compute L2 norm for the given conv layer’s kernels (by using ‘conv.weights’)?

Also, what’s the correct way for computing L1 norm for the same conv layer’s kernels?

submitted by /u/grid_world
[visit reddit] [comments]

Categories
Misc

Does TensorFlow-Lite Micro support LSTM layers?

I was looking through the documentation and its currently not clear, in some sources it’s said that subgraphs are not supported, but the tensorflow page says unidirectional lstms are supported, and I am confused, can anyone point me to TFLM implementation of an LSTM?

submitted by /u/LatePenguins
[visit reddit] [comments]

Categories
Misc

AI in the Sky: NVIDIA GPUs Help Researchers Remove Clouds from Satellite Images

Satellite images can be a fantastic civil engineering tool — at least when clouds don’t get in the way.  Now researchers at Osaka University have shown how to use GPU-accelerated deep learning to remove these clouds.  The scientists from the university’s Division of Sustainable Energy and Environmental Engineering used a “generative adversarial network” or GAN.  Read article >

The post AI in the Sky: NVIDIA GPUs Help Researchers Remove Clouds from Satellite Images appeared first on The Official NVIDIA Blog.

Categories
Offsites

High-Quality, Robust and Responsible Direct Speech-to-Speech Translation

Speech-to-speech translation (S2ST) is key to breaking down language barriers between people all over the world. Automatic S2ST systems are typically composed of a cascade of speech recognition, machine translation, and speech synthesis subsystems. However, such cascade systems may suffer from longer latency, loss of information (especially paralinguistic and non-linguistic information), and compounding errors between subsystems.

In 2019, we introduced Translatotron, the first ever model that was able to directly translate speech between two languages. This direct S2ST model was able to be efficiently trained end-to-end and also had the unique capability of retaining the source speaker’s voice (which is non-linguistic information) in the translated speech. However, despite its ability to produce natural sounding translated speech in high fidelity, it still underperformed compared to a strong baseline cascade S2ST system (e.g., composed of a direct speech-to-text translation model [1, 2] followed by a Tacotron 2 TTS model).

In “Translatotron 2: Robust direct speech-to-speech translation”, we describe an improved version of Translatotron that significantly improves performance while also applying a new method for transferring the source speakers’ voices to the translated speech. The revised approach to voice transference is successful even when the input speech contains multiple speakers speaking in turns while also reducing the potential for misuse and better aligning with our AI Principles. Experiments on three different corpora consistently showed that Translatotron 2 outperforms the original Translatotron by a large margin on translation quality, speech naturalness, and speech robustness.

Translatotron 2
Translatotron 2 is composed of four major components: a speech encoder, a target phoneme decoder, a target speech synthesizer, and an attention module that connects them together. The combination of the encoder, the attention module, and the decoder is similar to a typical direct speech-to-text translation (ST) model. The synthesizer is conditioned on the output from both the decoder and the attention.

Model architecture of Translatotron 2 (for translating Spanish speech into English speech).

There are three novel changes between Translatotron and Translatotron 2 that are key factors in improving the performance:

  1. While the output from the target phoneme decoder is used only as an auxiliary loss in the original Translatotron, it is one of the inputs to the spectrogram synthesizer in Translatotron 2. This strong conditioning makes Translatotron 2 easier to train and yields better performance.
  2. The spectrogram synthesizer in the original Translatotron is attention-based, similar to the Tacotron 2 TTS model, and as a consequence, it also suffers from the robustness issues exhibited by Tacotron 2. In contrast, the spectrogram synthesizer employed in Translatotron 2 is duration-based, similar to that used by Non-Attentive Tacotron, which drastically improves the robustness of the synthesized speech.
  3. Both Translatotron and Translatotron 2 use an attention-based connection to the encoded source speech. However, in Translatotron 2, this attention is driven by the phoneme decoder instead of the spectrogram synthesizer. This ensures the acoustic information that the spectrogram synthesizer sees is aligned with the translated content that it’s synthesizing, which helps retain each speaker’s voice across speaker turns.

More Powerful and Responsible Voice Retention
The original Translatotron was able to retain the source speaker’s voice in the translated speech, by conditioning its decoder on a speaker embedding generated from a separately trained speaker encoder. However, this approach also enabled it to generate the translated speech in a different speaker’s voice if a clip of the target speaker’s recording were used as the reference audio to the speaker encoder, or if the embedding of the target speaker were directly available. While this capability was powerful, it had the potential to be misused to spoof audio with arbitrary content, which posed a concern for production deployment.

To address this, we designed Translatotron 2 to use only a single speech encoder, which is responsible for both linguistic understanding and voice capture. In this way, the trained models cannot be directed to reproduce non-source voices. This approach can also be applied to the original Translatotron.

To retain speakers’ voices across translation, researchers generally prefer to train S2ST models on parallel utterances with the same speaker’s voice on both sides. Such a dataset with human recordings on both sides is extremely difficult to collect, because it requires a large number of fluent bilingual speakers. To avoid this difficulty, we use a modified version of PnG NAT, a TTS model that is capable of cross-lingual voice transferring to synthesize such training targets. Our modified PnG NAT model incorporates a separately trained speaker encoder in the same way as in our previous TTS work — the same strategy used for the original Translatotron — so that it is capable of zero-shot voice transference.

Following are examples of direct speech-to-speech translation from Translatotron 2 in which the source speaker’s voice is retained:

Input (Spanish): 
TTS-synthesized reference (English): 
Translatotron 2 prediction (English): 
Translatotron prediction (English): 

To enable S2ST models to retain each speaker’s voice in the translated speech when the input speech contains multiple speakers speaking in turns, we propose a simple concatenation-based data augmentation technique, called ConcatAug. This method augments the training data on the fly by randomly sampling pairs of training examples and concatenating the source speech, the target speech, and the target phoneme sequences into new training examples. The resulting samples contain two speakers’ voices in both the source and the target speech, which enables the model to learn on examples with speaker turns. Following are audio samples from Translatotron 2 with speaker turns:

Input (Spanish): 
TTS-synthesized reference (English): 
Translatotron 2 (with ConcatAug) prediction (English): 
Translatotron 2 (without ConcatAug) prediction (English): 

More audio samples are available here.

Performance
Translatotron 2 outperforms the original Translatotron by large margins in every aspect we measured: higher translation quality (measured by BLEU, where higher is better), speech naturalness (measured by MOS, higher is better), and speech robustness (measured by UDR, lower is better). It particularly excelled on the more difficult Fisher corpus. The performance of Translatotron 2 on translation quality and speech quality approaches that of a strong baseline cascade system, and is better than the cascade baseline on speech robustness.

Translation quality (measured by BLEU, where higher is better) evaluated on two Spanish-English corpora.
Speech naturalness (measured by MOS, where higher is better) evaluated on two Spanish-English corpora.
Speech robustness (measured by UDR, where lower is better) evaluated on two Spanish-English corpora.

Multilingual Speech-to-Speech Translation
Besides Spanish-to-English S2ST, we also evaluated the performance of Translatotron 2 on a multilingual set-up in which the model took speech input from four different languages and translated them into English. The language of the input speech was not provided, which forced the model to detect the language by itself.

Source Language  fr de es ca
Translatotron 2  27.0 18.8 27.7 22.5
Translatotron  18.9 10.8 18.8 13.9
ST (Wang et al. 2020 27.0 18.9 28.0 23.9
Training Target  82.1 86.0 85.1 89.3
Performance of multilingual X=>En S2ST on the CoVoST 2 corpus.

On this task, Translatotron 2 again outperformed the original Translatotron by a large margin. Although the results are not directly comparable between S2ST and ST, the close numbers suggest that the translation quality from Translatotron 2 is comparable to a baseline speech-to-text translation model, These results indicate that Translatotron 2 is also highly effective on multilingual S2ST.

Acknowledgments
The direct contributors to this work include Ye Jia, Michelle Tadmor Ramanovich, Tal Remez, Roi Pomerantz. We also thank Chung-Cheng Chiu, Quan Wang, Heiga Zen, Ron J. Weiss, Wolfgang Macherey, Yu Zhang, Yonghui Wu, Hadar Shemtov, Ruoming Pang, Nadav Bar, Hen Fitoussi, Benny Schlesinger, Michael Hassid for helpful discussions and support.

Categories
Misc

Competition and Community Insights from NVIDIA’s Kaggle Grandmasters

In this post, we summarize questions and answers from GTC sessions with NVIDIA’s Kaggle Grandmaster team.  Additionally, we answer audience questions we did not get a chance during these sessions. Q: How do you decide which competitions to join? Ahmet: I read the competition description and evaluation metric. Then I give myself several days to … Continued

In this post, we summarize questions and answers from GTC sessions with NVIDIA’s Kaggle Grandmaster team.  Additionally, we answer audience questions we did not get a chance during these sessions.

Q: How do you decide which competitions to join?

Ahmet: I read the competition description and evaluation metric. Then I give myself several days to think about if I have any novel ideas that I can try on. If I do not have any interesting ideas, then I do not join. But sometimes I just join for learning and improving my skills.

Q: Is mathematics mandatory for winning a competition?

Kazuki: Not mandatory, but you may want to understand the competition metric and how machine learning models work. For example, the linear model and tree model are totally different. So those would generate good results when ensembling.

Q: How do you approach a competition?

Bojan: On the first day, I always submit a sample so that I am on the leaderboard. Traditionally, I have not been very big on data analysis or EDA, which is one of my weaknesses. But recently, I started doing more and changing my approach.

One thing I always do is see how easy it is to ensemble different models in a competition. This dictates my strategy in the long run. If ensembling slightly different models can give a nice boost, it means that building many diverse models is important. However, if ensembling does not give you a big boost, then feature engineering or coming up with creative features is more important in the long run.

One of the strategies is to try to improve a single model as much as you can, and only ensemble once you are satisfied with it.

Jean-Francois: It is a good idea to read what people share in the forum in every competition. This means to read what the host writes, including comments. And to read top solutions in similar recent competitions. Surprisingly, some competitions are won by models that are publicly shared from previous competitions and adapted to the new one. People do not read enough. You can also try to find papers on the topic, especially for science competitions where there are often relevant papers.

Giba: Download the data and run some EDA. Get insights about feature and target distribution in order to find the best validation strategy. Random KFold is usually good for most of the problems, but usually it is necessary to groupedKFold or time split Kfold. Once you find the best validation strategy, run a simple model using it and submit to check the leaderboard score. Usually this is the most important thing in a competition, all metric improvements made locally should be translated to Kaggle leaderboard if validation is robust and made correctly. After that you must work on feature engineering and build a diverse set of models with different dataset and training algorithms. Usually, an ensemble of a model trained with ANN and a GBDT is good enough to rank high in LB. Search for target leakage is unfortunately part of Kaggle competition.

Q: Which deep learning framework would you recommend starting with?

Jean-Francois: I think the best one is Keras because it is very abstract. You can build rather complex models and train them in a few lines of code. Then you may want to move to PyTorch or TensorFlow for two reasons: to have better control of your models and customize your layers as well as the ability to reuse pretrained models. For that, I have the impression that PyTorch is taking the lead. What we do on Kaggle is mostly model prototyping. Maybe today, TensorFlow is better at model deployment, but it is not relevant at Kaggle.

Jiwei: I would add that PyTorch Lightning is a user-friendly package based on PyTorch, especially for new users. It abstracts the details of training and provides convenient APIs for advanced features such as Multi-GPU, TPU, and mixed precision.

Audience poll showed that 66% preferred PyTorch and 31% preferred TensorFlow.

Q: How do you prevent overfitting when using pseudo-labeling? Is it okay to use that strategy with an ensemble?

Bo: In the recent RANZCR competition, our team won using both pseudo-labeling and ensemble. It’s ok to use both, but you should be very careful doing so in order to prevent overfitting.

  • First, you want to split the original data into five folds and split external data into five folds. In both stages, there will be five models.
  • In Stage 1, train the model on the original data, and do inference on the external data to have external data prediction. Do this five times.
  • In Stage 2, combine the original data (with original labels) and external data (with Stage 1 predictions as pseudo labels) and train the models again.

The important thing is, when we make pseudo labels, we want to make five copies of the pseudo labels. For Stage 2’s fold0 model (trained on combined fold1,2,3,4 and validated on combined fold0), we want to make sure it never had fold0’s information, so the pseudo labels used for this model need to come from Stage 1’s fold0 model (train on original fold1,2,3,4). This way you will never have any leakage.

It is ok to use ensemble together with pseudo-labeling. In the RANZCR competition, we used ensembles in both stages.

Chris: pseudo-labeling is one of the things that I specifically learned at Kaggle because none of the books I had read talked about it. Kaggle is a great place to learn practical tricks like pseudo-labeling.

Q: What are commonly used post-processing techniques?  How can I improve my score on multi-label classification problems?

Chris: I’ll take a first stab at this. Recently a Kaggler has called me the Post-processing Grandmaster because I just got my fifth gold medal specifically using post-processing. It was a solo gold medal. [The criteria for competition grandmaster are five gold medals including at least one solo gold medal.] I will share a few secrets.

The first thing is to study the competition metric. Some metrics are called the ranking metrics (like AUC). For these metrics, the absolute predicted values do not matter. Only the relative orders matter. For a multi-label classification problem, the first thing to ask is whether the predictions are ranked per label, or altogether. In the recent Rainforest competition where we predict animal sounds in rainforests, all the predictions are ranked across labels. So it is important that the model knows which animals are common and which are rare.

Other types of metrics are based on mean values, like Mean Squared Error. If the test data have different mean than the training data, shifting the predictions can improve the metrics.

For metrics like recall and precision, you should know their meanings. Always know your metrics. Each metric requires you to do different things and apply different post-processing. Personally, I really enjoy doing this. I come from a mathematical background. Metrics are mathematical equations and I like to think about what is important to optimize.

Bo: I’d like to add one thing. If the metric is log loss, sometimes it helps to clip extreme values. Models can make confident predictions with values close to 0 or 1, but if there are label errors, the penalty by log loss can be huge. So, it may be a good idea to clip the predictions at 0.01/0.99 or 0.02/0.98. But always find the optimal clip thresholds in local validation.

Q: How can explainability be used when working with deep learning ensembles?

Christof: I would say, that strongly depends on the ensemble method. I often use a simple average of single models. So, if single models are explainable, so probably is the ensemble as a simple combination of them. But on the other hand, I agree, that ensembles introduce another aspect to explain, like why specific models contribute more to the ensemble than others despite a mediocre individual performance.

Q: How do you do the hyperparameter optimization, feature engineering and feature selection cycle in practice?

Chris: Personally, I do not spend too much time optimizing hyperparameters. I will explore the important parameters when building XGB or NN models. (For example, I will adjust max_depth, subsample, and colsample_bytree with XGB. And loss, learning rate and scheduler with NN). But when trying to improve models, I will spend more time exploring feature engineering with XGB and data augmentation, architecture design, and/or TTA with NN.

Q: How can you get the best performance out of a Neural Network?

Jean-Francois: Work on data pre-processing (including augmentations) and post-processing. Newcomers often focus too much on hyperparameter tuning or choice of optimizer.  I almost always stick to Adam with a cosine learning schedule.

Jiwei: Multi-head multi-loss function is another common trick to improve the performance of NN. It works as a way of regularization.

Bo: I want to point to this great post by Andrej Karpathy where he shared many NN tricks: http://karpathy.github.io/2019/04/25/recipe/

Q: What is the best way to learn from Kaggle as a beginner?

Bojan: Check out the notebooks that people post, read topics that are being discussed, and try running models that are shared and improve them using the ideas that are discussed. These are the few steps that can get you pretty far in your machine learning skills, if not your Kaggle performance.

Bo: A good way for beginners to get started is to team up. Of course, this depends on personality. Some people prefer working alone, like bestfitting. But for many people I think teaming up is a good way to learn because different people have different skill sets. They can often complement each other. You can always pick up a thing or two from each teammate. Of course you need to do some work before requesting a team merge. Do not ask people on top of the leaderboard to team up with you without doing much. Try to ask people who are close to your leaderboard position.

Chris: I concur. I did many solo competitions, but recently I have been doing more teaming up. In every single team-up, I learned stuff. Even if it is as simple as watching how people organize their code, or what computer language they are using. It can just be learning how they approach the problem, or how they set up their experiments. There is just so much to learn when working with someone else that can help you become a better data scientist.

Jean-Francois: Do not be shy. Just jump into the water. You will learn how to swim. There is one last thing I recommend. When you join Kaggle, you are asked to create a user name. You can use an alias if you are afraid your friends or colleagues see you struggle when you begin. That is what I did. I only disclosed my real name once I became comfortable. So just sign up, choose a pseudonym, learn, and try. After a competition, do not just go to the next one. Read what people share. Try to think about what you could have done better, why you could have thought of this cool idea you just read. The few days after a competition end is when you will learn the most.

Q: Do you recommend building your own machine or buying a pre-built system for deep learning?

Jean-Francois: Building is often cheaper, but it requires more skills and time.  If you have both then build your gear.  There are shops that will build custom PCs to your configuration for you. I personally did not have the time and skills and bought a custom-made PC with a GTX 1080 Ti and was very happy with it.  Nowadays, you can find PCs, including laptops, with good GPU from major PC makers.

Jiwei: Another option is an external GPU box. I used to train deep learning models on a laptop, which connected to an external GPU box with a desktop level GPU card.

Q: What do you enjoy the most about Kaggle?

Chris: The community. Kaggle is a unique place to meet great quality data scientists where you cannot elsewhere.

Jean-Francois: Kaggle is definitely the place to go if you want to know the state-of-the-art in modeling actual problems using machine learning.  And it is addictive.

Jiwei: To learn new algorithms, new modeling techniques in practice. I find myself more motivated and focused when I can apply new models from papers to solve real-world problems.

Bo: Reading top solution posted by Kagglers after each competition. Every time I can learn some new tricks.

Bojan: It is an amazing platform for learning. I do not think there is any other platform where you can learn as much and as quickly as on Kaggle.

Giba: The ability to work on the most diverse problems, and at the same time to learn and apply the state-of-the-art algorithms to solve them.

Kazuki: I enjoy getting knowledge which I’m interested in.

Christof: To solve very complex problems and come up with innovative solutions.

Ahmet: I enjoy Kaggle’s problem diversity, and I enjoy climbing the leaderboard.

Categories
Misc

Upcoming Webinar: Introduction to Building Conversational AI Applications

Join the Emerging Chapters Educational Series webinar on conversational AI concepts and building efficient pipelines.

NVIDIA is hosting a webinar with live Q&A at 10 am PDT on Oct. 14 as a part of the Emerging Chapters Educational Series. 

This technical session is open to anyone looking to learn beginner and intermediate level concepts and applications that show how to build efficient conversational AI pipelines.  

Highlights Include:

  1. Building conversational AI apps and services using NVIDIA TAO and open-source toolkits such as NeMo 
  2. Deploying apps using NVIDIA RIVA 
  3. Machine translation demos 

Register now >>

Categories
Misc

Doing the Math: Michigan Team Cracks the Code for Subatomic Insights

In record time, Vikram Gavini’s lab crossed a big milestone in viewing tiny things. The three-person team at the University of Michigan crafted a program that uses complex math to peer deep into the world of the atom. It could advance many fields of science, as well as the design for everything from lighter cars Read article >

The post Doing the Math: Michigan Team Cracks the Code for Subatomic Insights appeared first on The Official NVIDIA Blog.

Categories
Misc

Teamwork Makes the Dream Work: GFN Thursday Celebrates Team17 Titles Streaming From the Cloud

GFN Thursday is all about bringing games powered by GeForce NOW’s GPUs in the cloud to gamers. Today, that spotlight shines on Team17, the prolific publisher behind many games in the GeForce NOW library. The party gets started with their newest release, a day-and-date launch of Sheltered 2, streaming on the cloud alongside the 12 Read article >

The post Teamwork Makes the Dream Work: GFN Thursday Celebrates Team17 Titles Streaming From the Cloud appeared first on The Official NVIDIA Blog.

Categories
Misc

Are there any unofficial slacks or chat rooms where people post questions to use tensorflow, especially federated tensorflow?

So, like the title says. Just wondering if there is anything like this. I know that StackOverflow has stuff like that, but rather ask on slack or something similar.

submitted by /u/Throooaway10
[visit reddit] [comments]

Categories
Misc

Removing TensorFlow filters

With Python 3.8 and TensorFlow 2.5, my objective is to remove filters/kernels having lowest L2 norms. Sample code for this is:

 # Generate random 1 image/data point sample- x = tf.random.normal(shape = (1, 5, 5, 3), mean = 1.0, stddev = 0.5) x.shape # TensorShape([1, 5, 5, 3]) # Create conv layer- conv = Conv2D( filters = 3, kernel_size = (3, 3), activation='relu', kernel_initializer = tf.initializers.GlorotNormal(), bias_initializer = tf.ones_initializer, strides = (1, 1), padding = 'same', ) # Pass input through conv layer- out = conv(x) out.shape # TensorShape([1, 5, 5, 3]) out = tf.squeeze(out) out.shape # TensorShape([5, 5, 3]) 

According to my understanding, the output consists of three (5, 5) matrices stacked together. However, printing ‘out’ shows five (5, 3) matrices stacked together:

 out.numpy() ''' array([[[1.45877 , 0. , 1.9293344 ], [0.9910869 , 0.01100129, 1.7364411 ], [1.8199034 , 0. , 1.3457474 ], [1.219409 , 0.22021294, 0.62214017], [0.5572515 , 0.7246016 , 0.6772853 ]], [[1.161148 , 0. , 2.0277915 ], [0.38071448, 0. , 2.2438798 ], [2.2897398 , 0.1658966 , 2.3147004 ], [1.2516301 , 0.14660472, 1.6381929 ], [1.1554463 , 0.72516847, 1.6170584 ]], [[0. , 0. , 1.2525308 ], [0.4337383 , 0. , 0.91200435], [0.71451795, 0. , 2.093022 ], [2.265062 , 0. , 2.7562256 ], [0.82517993, 0. , 1.8439718 ]], [[0.7089497 , 0. , 1.041831 ], [0. , 0. , 1.2754116 ], [0.41919613, 0. , 0.88135654], [0. , 0. , 0.71492153], [0.18725157, 0.27108306, 0.11248505]], [[0.86042166, 0.45840383, 1.084069 ], [0.53202367, 0.42414713, 1.2529668 ], [1.2257886 , 0.31592917, 1.3377004 ], [0.36588144, 0. , 0.6085663 ], [0.3065148 , 0.574654 , 1.0214479 ]]], dtype=float32) ''' 

So, if I use the code out[:, :, 0], out[:, :, 1] & out[:, :, 2], do they refer to the first, second and third filters?

And if yes, is computing L2-norm using:

 tf.norm(out, ord = 'euclidean', axis = (0, 1)).numpy() # array([5.275869 , 1.4290226, 7.545658 ], dtype=float32) 

the correct way?

submitted by /u/grid_world
[visit reddit] [comments]