Categories
Offsites

Introducing CVSS: A Massively Multilingual Speech-to-Speech Translation Corpus

Automatic translation of speech from one language to speech in another language, called speech-to-speech translation (S2ST), is important for breaking down the communication barriers between people speaking different languages. Conventionally, automatic S2ST systems are built with a cascade of automatic speech recognition (ASR), text-to-text machine translation (MT), and text-to-speech (TTS) synthesis sub-systems, so that the system overall is text-centric. Recently, work on S2ST that doesn’t rely on intermediate text representation is emerging, such as end-to-end direct S2ST (e.g., Translatotron) and cascade S2ST based on learned discrete representations of speech (e.g., Tjandra et al.). While early versions of such direct S2ST systems obtained lower translation quality compared to cascade S2ST models, they are gaining traction as they have the potential both to reduce translation latency and compounding errors, and to better preserve paralinguistic and non-linguistic information from the original speech, such as voice, emotion, tone, etc. However, such models usually have to be trained on datasets with paired S2ST data, but the public availability of such corpora is extremely limited.

To foster research on such a new generation of S2ST, we introduce a Common Voice-based Speech-to-Speech translation corpus, or CVSS, which includes sentence-level speech-to-speech translation pairs from 21 languages into English. Unlike existing public corpora, CVSS can be directly used for training such direct S2ST models without any extra processing. In “CVSS Corpus and Massively Multilingual Speech-to-Speech Translation”, we describe the dataset design and development, and demonstrate the effectiveness of the corpus through training of baseline direct and cascade S2ST models and showing performance of a direct S2ST model that approaches that of a cascade S2ST model.

Building CVSS
CVSS is directly derived from the CoVoST 2 speech-to-text (ST) translation corpus, which is further derived from the Common Voice speech corpus. Common Voice is a massively multilingual transcribed speech corpus designed for ASR in which the speech is collected by contributors reading text content from Wikipedia and other text corpora. CoVoST 2 further provides professional text translation for the original transcript from 21 languages into English and from English into 15 languages. CVSS builds on these efforts by providing sentence-level parallel speech-to-speech translation pairs from 21 languages into English (shown in the table below).

To facilitate research with different focuses, two versions of translation speech in English are provided in CVSS, both are synthesized using state-of-the-art TTS systems, with each version providing unique value that doesn’t exist in other public S2ST corpora:

  • CVSS-C: All the translation speech is in a single canonical speaker’s voice. Despite being synthetic, the speech is highly natural, clean, and consistent in speaking style. These properties ease the modeling of the target speech and enable trained models to produce high quality translation speech suitable for general user-facing applications where speech quality is of higher importance than accurately reproducing the speakers’ voices.
  • CVSS-T: The translation speech captures the voice from the corresponding source speech. Each S2ST pair has a similar voice on the two sides, despite being in different languages. Because of this, the dataset is suitable for building models where accurate voice preservation is desired, such as for movie dubbing.

Together with the source speech, the two S2ST datasets contain 1,872 and 1,937 hours of speech, respectively.

Source
Language    
Code     Source
  speech (X)  
CVSS-C
  target speech (En)  
CVSS-T
  target speech (En)  
French fr 309.3 200.3 222.3
German de 226.5 137.0 151.2
Catalan ca 174.8 112.1 120.9
Spanish es 157.6 94.3 100.2
Italian it 73.9 46.5 49.2
Persian fa 58.8 29.9 34.5
Russian ru 38.7 26.9 27.4
Chinese zh 26.5 20.5 22.1
Portuguese     pt 20.0 10.4 11.8
Dutch nl 11.2 7.3 7.7
Estonian et 9.0 7.3 7.1
Mongolian mn 8.4 5.1 5.7
Turkish tr 7.9 5.4 5.7
Arabic ar 5.8 2.7 3.1
Latvian lv 4.9 2.6 3.1
Swedish sv 4.3 2.3 2.8
Welsh cy 3.6 1.9 2.0
Tamil ta 3.1 1.7 2.0
Indonesian id 3.0 1.6 1.7
Japanese ja 3.0 1.7 1.8
Slovenian sl 2.9 1.6 1.9
Total 1,153.2 719.1 784.2
Amount of source and target speech of each X-En pair in CVSS (hours).

In addition to translation speech, CVSS also provides normalized translation text matching the pronunciation in the translation speech (on numbers, currencies, acronyms, etc., see data samples below, e.g., where “100%” is normalized as “one hundred percent” or “King George II” is normalized as “king george the second”), which can benefit both model training as well as standardizing the evaluation.

CVSS is released under the Creative Commons Attribution 4.0 International (CC BY 4.0) license and it can be freely downloaded online.

Data Samples

Example 1:
Source audio (French)   
Source transcript (French)    Le genre musical de la chanson est entièrement le disco.
CVSS-C translation audio (English)   
CVSS-T translation audio (English)   
Translation text (English)    The musical genre of the song is 100% Disco.
Normalized translation text (English)        the musical genre of the song is one hundred percent disco
     
     
Example 2:
Source audio (Chinese)       
Source transcript (Chinese)        弗雷德里克王子,英国王室成员,为乔治二世之孙,乔治三世之幼弟。
CVSS-C translation audio (English)       
CVSS-T translation audio (English)       
Translation text (English)        Prince Frederick, member of British Royal Family, Grandson of King George II, brother of King George III.
Normalized translation text (English)        prince frederick member of british royal family grandson of king george the second brother of king george the third

Baseline Models
On each version of CVSS, we trained a baseline cascade S2ST model as well as two baseline direct S2ST models and compared their performance. These baselines can be used for comparison in future research.

Cascade S2ST: To build strong cascade S2ST baselines, we trained an ST model on CoVoST 2, which outperforms the previous states of the art by +5.8 average BLEU on all 21 language pairs (detailed in the paper) when trained on the corpus without using extra data. This ST model is connected to the same TTS models used for constructing CVSS to compose very strong cascade S2ST baselines (ST → TTS).

Direct S2ST: We built two baseline direct S2ST models using Translatotron and Translatotron 2. When trained from scratch with CVSS, the translation quality from Translatotron 2 (8.7 BLEU) approaches that of the strong cascade S2ST baseline (10.6 BLEU). Moreover, when both use pre-training the gap decreases to only 0.7 BLEU on ASR transcribed translation. These results verify the effectiveness of using CVSS to train direct S2ST models.

Translation quality of baseline direct and cascade S2ST models built on CVSS-C, measured by BLEU on ASR transcription from speech translation. The pre-training was done on CoVoST 2 without other extra data sets.

Conclusion
We have released two versions of multilingual-to-English S2ST datasets, CVSS-C and CVSS-T, each with about 1.9K hours of sentence-level parallel S2ST pairs, covering 21 source languages. The translation speech in CVSS-C is in a single canonical speaker’s voice, while the same in CVSS-T is in voices transferred from the source speech. Each of these datasets provides unique value not existing in other public S2ST corpora.

We built baseline multilingual direct S2ST models and cascade S2ST models on both datasets, which can be used for comparison in future works. To build strong cascade S2ST baselines, we trained an ST model on CoVoST 2, which outperforms the previous states of the art by +5.8 average BLEU when trained on the corpus without extra data. Nevertheless, the performance of the direct S2ST models approaches the strong cascade baselines when trained from scratch, and with only 0.7 BLEU difference on ASR transcribed translation when utilized pre-training. We hope this work helps accelerate the research on direct S2ST.

Acknowledgments
We acknowledge the volunteer contributors and the organizers of the Common Voice and LibriVox projects for their contribution and collection of recordings, the creators of Common Voice, CoVoST, CoVoST 2, Librispeech and LibriTTS corpora for their previous work. The direct contributors to the CVSS corpus and the paper include Ye Jia, Michelle Tadmor Ramanovich, Quan Wang, Heiga Zen. We also thank Ankur Bapna, Yiling Huang, Jason Pelecanos, Colin Cherry, Alexis Conneau, Yonghui Wu, Hadar Shemtov and Françoise Beaufays for helpful discussions and support.

Categories
Misc

Merge Sort Explained: A Data Scientist’s Algorithm Guide

The article includes a step by step explanation of the merge sort algorithm and code snippets illustrating the implementation of the algorithm itself.

Data Scientists deal with algorithms daily. However, the data science discipline as a whole has developed into a role that does not involve implementation of sophisticated algorithms. Nonetheless, practitioners can still benefit from building an understanding and repertoire of algorithms.

In this article, the sorting algorithm merge sort is introduced, explained, evaluated, and implemented. The aim of this post is to provide you with robust background information on the merge sort algorithm, which acts as foundational knowledge for more complicated algorithms.

Although merge sort is not considered to be complex, understanding this algorithm will help you recognize what factors to consider when choosing the most efficient algorithm to perform data-related tasks. Created in 1945, John Von Neumann developed the merge sort algorithm using the divide-and-conquer approach.

Divide and conquer

To understand the merge sort algorithm, you must be familiar with the divide and conquer paradigm, alongside the programming concept of recursion. Recursion within the computer science domain is when a method defined to solve a problem involves an invocation of itself within its implementation body.

In other words, the function calls itself repeatedly.

Visual illustration of recursion.
Figure 1. Visual illustration of recursion – Image by author.

Divide and conquer algorithms (which merge sort is a type of) employ recursion within its approach to solve specific problems. Divide and conquer algorithms decompose complex problems into smaller sub-parts, where a defined solution is applied recursively to each sub-part. Each sub-part is then solved separately, and the solutions are recombined to solve the original problem.

The divide-and-conquer approach to algorithm design combines three primary elements:

  • Decomposition of the larger problem into smaller subproblems. (Divide)
  • Recursive utilization of functions to solve each of the smaller subproblems. (Conquer)
  • The final solution is a composition of the solution to the smaller subproblems of the larger problem. (Combine)

Other algorithms use the divide-and-conquer paradigm, such as Quicksort, Binary Search, and Strassen’s algorithm.

Merge sort

In the context of sorting elements in a list and in ascending order, the merge sort method divides the list into halves, then iterates through the new halves, continually dividing them down further to their smaller parts.

Subsequently, a comparison of smaller halves is conducted, and the results are combined together to form the final sorted list.

Steps and implementation

Implementation of the merge sort algorithm is a three-step procedure. Divide, conquer, and combine.

The divide component of the divide-and-conquer approach is the first step. This initial step separates the overall list into two smaller halves. Then, the lists are broken down further until they can no longer be divided, leaving only one element item in each halved list.

The recursive loop in merge sort’s second phase is concerned with the list’s elements being sorted in a particular order. For this scenario, the initial array is sorted in ascending order.

In the following illustration, you can see the division, comparison, and combination steps involved in the merge sort algorithm.

Image showing the divide component of the merge sort algorithm.
Figure 2. Divide component illustration of the Merge sort algorithm—Image by Author.
Image showing the Conquer and combine component of the merge sort algorithm.
Figure 3. Conquer and combine components—Image by author.

To implement this yourself:

  • Create a function called merge_sort that accepts a list of integers as its argument. All following instructions presented are within this function.
  • Start by dividing the list into halves. Record the initial length of the list.
  • Check that the recorded length is equal to 1. If the condition evaluates to true, return the list as this means that there is just one element within the list. Therefore, there is no requirement to divide the list.
  • Obtain the midpoint for a list with a number of elements greater than 1. When using the Python language, the // performs division with no remainder. It rounds the division result to the nearest whole number. This is also known as floor division.
  • Using the midpoint as a reference point, split the list into two halves. This is the divide aspect of the divide-and-conquer algorithm paradigm.
  • Recursion is leveraged at this step to facilitate the division of lists into halved components. The variables ‘left_half’ and ‘right_half’ are assigned to the invocation of the ‘merge_sort’ function, accepting the two halves of the initial list as parameters.
  • The ‘merge_sort’ function returns the invocation of a function that merges two lists to return one combined, sorted list.
def merge_sort(list: [int]):
    list_length = len(list)
    
    if list_length == 1:
        return list
    
    mid_point = list_length // 2
    
    left_half = merge_sort(list[:mid_point])
    right_half = merge_sort(list[mid_point:])
    
    return merge(left_half, right_half)
  • Create a ‘merge’ function that accepts two lists of integers as its arguments. This function contains the conquer and combine aspects of the divide-and-conquer algorithm paradigm. All following steps are executed within the body of this function.
  • Assign an empty list to the variable ‘output’ that holds the sorted integers.
  • The pointers ‘i’ and ‘j’ are used to index the left and right lists, respectively.
  • Within the while loop, there is a comparison between the elements of both the left and right lists. After each comparison, the output list is populated within the two compared elements. The pointer of the list of the appended element is incremented.
  • The remaining elements to be added to the sorted list are elements obtained from the current pointer value to the end of the respective list.
def merge(left, right):
    output = []
    i = j = 0
    
    while (i 

Performance and complexity

Big O notation is a standard for defining and organizing the performance of algorithms in terms of their space requirement and execution time.

Merge sort algorithm time complexity is the same for its best, worst, and average scenarios. For a list of size n, the expected number of steps, minimum number of steps, and maximum number of steps for the merge sort algorithm to complete, are all the same.

As noted earlier in this article, the merge sort algorithm is a three-step process: divide, conquer, and combine. The ‘divide’ step involves the computation of the midpoint of the list, which, regardless of the list size, takes a single operational step. Therefore the notation for this operation is denoted as O(1).

The ‘conquer’ step involves dividing and recursively solving subarrays–the notation log n denotes this. The ‘combine’ step consists of combining the results into a final list; this operation execution time is dependent on the list size and denoted as O(n).

The merge sort notation for its average, best, and worst time complexity is log n * n * O(1). In Big O notation, low-order terms and constants are negligible, meaning the final notation for the merge sort algorithm is O(n log n). For a detailed analysis of the merge sort algorithm, refer to this article.

Evaluation

Merge sort performs well when sorting large lists, but its operation time is slower than other sorting solutions when used on smaller lists. Another disadvantage of merge sort is that it will execute the operational steps even if the initial list is already sorted. In the use case of sorting linked lists, merge sort is one of the fastest sorting algorithms to use. Merge sort can be used in file sorting within external storage systems, such as hard drives.

Key takeaways

This article describes the merge sort technique by breaking it down in terms of its constituent operations and step-by-step processes.

Merge sort algorithm is commonly used and the intuition and implementation behind the algorithm is rather straightforward in comparison to other sorting algorithms. This article includes the implementation step of the merge sort algorithm in Python.

You should also know that the time complexity of the merge sort method’s execution time in different situations, remains the same for best, worst, and average scenarios. It is recommended that merge sort algorithm is applied in the following scenarios:

  • When dealing with larger sets of data, use the merge sort algorithm. Merge sort performs poorly on small arrays when compared to other sorting algorithms.
  • Elements within a linked list have a reference to the next element within the list. This means that within the merge sort algorithm operation, the pointers are modifiable, making the comparison and insertion of elements have a constant time and space complexity.
  • Have some form of certainty that the array is unsorted. Merge sort will execute its operations even on sorted arrays, a waste of computing resources.
  • Use merge sort when there is a consideration for the stability of data. Stable sorting involves maintaining the order of identical values within an array. When compared with the unsorted data input, the order of identical values throughout an array in a stable sort is kept in the same position in the sorted output.
Categories
Misc

Inversion of a multivariable function

Hello everyone,

I’m pretty new to ML and tensorflow. What I’m trying to do now is practically to inverse a function using tensorflow.

So I have a function h=f(c1, c2,..cn, T). It is a smooth function of all the variables. I want to train a model which would give me T given known values of c1…cn and h.

For now I’m using a keras.Sequential model with 2 or 3 dense layers.

For loss I use ‘mean_absolute_error’, For optimizer – Adam().

To train the model I generate a dataset using my h(c1…cn, T) function by varying its arguments and using values of T as train_labels.

The accuracy of the resulting model is not very good to my mind – I’m getting errors of about 10%. To my mind this is not very good, given that the training dataset is ideally smooth.

My questions are:

  1. Am I doing something particularly wrong?

  2. How many units should I provide for each layer? I mean in tutorials they are using either Dense(64) or Dense(1). What difference does it make in my particular case? Should it be proportional to the number of parameters of the model?

  3. May be I should use some other types of layers/optimizers/losses?

Thank you in advance for your replies!

submitted by /u/_padla_
[visit reddit] [comments]

Categories
Misc

Is it possible to serve models that have a distributed architecture, with multiple shards using tfx serving?

I’m planning to build a model that uses ParameterServerStrategy to distribute its parameters across multiple VMs with an assumption that it cannot fit into any one of the VMs. Is it possible to use TensorFlow serving to distribute the model across multiple VMs?

I was reading through this (https://www.tensorflow.org/tfx/serving/architecture) and I discovered that you need multiple servables for composite models that have multiple parts to them. But I couldn’t find any documentation that talks specifically about creating a cluster with tf serving, that uses multiple servables with each servable in a single VM.

Is it possible to use TensorFlow serving for very large models, where the models have its parameters spread across multiple VMs? If yes, can you please tell me how it can be done?

Thanks in advance!

submitted by /u/deathconqueror
[visit reddit] [comments]

Categories
Misc

An A-peel-ing GFN Thursday Sprouts 20+ New Games Coming to GeForce NOW in April

In addition to GFN Thursday, it’s National Tater Day. Hooray! To honor the spud-tacular holiday, we’re closing out March with seven new games streaming this week. And a loaded 20+ titles are coming to the GeForce NOW library in April to play — even on a potato PC, thanks to GeForce NOW. Plus, the GeForce Read article >

The post An A-peel-ing GFN Thursday Sprouts 20+ New Games Coming to GeForce NOW in April appeared first on NVIDIA Blog.

Categories
Misc

Data Science

Hi, very fresh to ML in general, I’m overwhelmed by choice for loss functions and accuracy measurements. Doing my masters project and my supervisor wants me to teach myself tf and throw the data through a model.

I’m feeding in an array of particle data from a decay, and asking for it to predict a 1 or -1 at the end representing parity of the original particle.

What loss fn and accuracy metric is best to use for something like this.

Other parameters, about 2 million events, 6particles in each decay/event. Their 4 momenta in each. That makes 24 in put strings.

First layer is a tf.flatten,

submitted by /u/DriftingRumour
[visit reddit] [comments]

Categories
Misc

Generating negative outputs with only nonnegative weights and biases.

I am trying to have a layer with only positive weights generate a partially negative output. What would the least hacky way to achieve this be? My initial idea was using a modified activation function that is shifted by 1 along the x axis, however, this feels a bit hacky to me and I was wondering if there was a better way to achieve this

submitted by /u/salted_kinase
[visit reddit] [comments]

Categories
Misc

Storing a large Tensor in multiple GPUs

Hi Guys,

Is there a way to store a tensor in multiple GPUs? I have a tensor that is so large that requires >16Gb of GPU RAM to store. I was wonder how that can be achieved. Thanks!

submitted by /u/dr_meme_69
[visit reddit] [comments]

Categories
Misc

Correct way to get output of hidden layers after replacing some hidden layers?

Hello,

I am working on a project to replace layers by new layers to see if the changes affected positively or negatively. I want to then get the output feature map and input feature map after replacement. The issue I am having is that after a couple of changes, I get that I have multiple connections and a new column called ‘connnected to’ appears. Here are the summaries and the code I am using for replacing layers. I sometimes get this warning after replacing a convolutional layer with the code provided.

pastebin to warning and model summary

I have tried to create an Input layer and then use the same functional approach. My first layer being the input layer and the second the conv2d_0 layer. However, I get ValueError Disconnected from Graph for the input layer after two layer changes.

Code:

inputs = self.model.layers[0].input x = self.model.layers[0](inputs) for layer in self.model.layers[1:]: if layer.name == layer_name: new_layer = #creation of custom layer that generates output of same shape as replaced layer. x = new_layer(x) else: layer.trainable = False x = layer(x) self.model = tf.keras.Model(inputs, x) 

submitted by /u/ElvishChampion
[visit reddit] [comments]

Categories
Misc

Polestar’s Dennis Nobelius on the Sustainable Performance Brand’s Plans

Four words: smart, sustainable, Super Bowl. Polestar’s commercial during the big game made it clear no-compromise electric vehicles are now mainstream. Polestar Chief Operating Officer Dennis Nobelius sees driving enjoyment and autonomous-driving capabilities complementing one another in sustainable vehicles that keep driving — and the driver — front and center. NVIDIA’s Katie Washabaugh spoke with Read article >

The post Polestar’s Dennis Nobelius on the Sustainable Performance Brand’s Plans appeared first on NVIDIA Blog.