Pekka Varis’s artistry has come a long way from his early days as a self-styled “punk activist” who spray painted during the “old school days of hip hop in Finland.”
I have a large database of hundreds of thousands of users interacting with thousands of products for given amount of time, with more time indicating more interest. My company wants to understand if there are particular subgroups of similar consumers. In order to discover if that’s the case, I’ve built a 2-stage ML approach:
– using k-means clustering on the user embeddings, I classify a particular user as a member of a particular cluster.
With this approach, I run into 2 challenges:
– the basic retrieval does not take into account the implicit feedback of the amount of time. This seems like a recurring theme in this space – to weigh the user-item interactions by some measure of implicit feedback. I can’t seem to find any TF implementations though – any tips?
– my trained user embedding layer does not seem suitable for k-means clustering in a sense, since its measure of inter vs intra cluster distance does not meaningfully reduce over training iterations, and (more importantly) decreases linearly(!) with a higher value for k, making it impossible to use the elbow method to determine an objectively good trade-off between k and explained variance.
what would you advice to tackle both of these issues? Thanks for thinking along!
I know some of these questions are more ‘applied machine learning’ than ‘tensorflow’ per se, but I didn’t know where else to take this question, so apologies if this in the wrong category.
ValueError: A target array with shape (1288, 1) was passed for an output of shape (None, 256, 1) while using as loss `binary_crossentropy`. This loss expects targets to have the same shape as the output.
Hello I’m doing a CNN project for classification of Corn disease images(4 classes) It uses VGG16 as its base model. I have created and saved the model. Now is it possible to use that model as a base for another transfer learning task to classify cotton leaf disease images( 4 classes) with retaining the knowledge gained from corn disease images along with cotton leaf disease images? If so how should I modify the corn disease model. Should I need to make the output layer neurons as 8( 4 for cotton, 4 for corn disease) ?
I’m having a trained model of image classification which works fine on python but when i convert it to tflite and deploy on Android it gives different results Can anyone help me here
Hello I have been trying to create two tensorflow models to experiment with transfer learning. I have a trainned a cnn model for lung xray images for pneumonia(2 classes) by using the kaggle chest x-ray dataset .
Here is my code
import tensorflow as tf
import numpy as np
from tensorflow import keras
import os
from tensorflow.keras.preprocessing.image import ImageDataGenerator
Then I created a new notebook loaded data from kaggle for alzheimer’s disease and loaded my saved pneumonia model. Copied its layer to a new model except last layer then Freezed all the layers in the new model as non trainable. Then added a output dense layer with 4 neurons for 4 classes. Then trainned only the last layer for 5 epochs. But the problem is val accuaracy remains at 35% constant. How can I improve it.
Here is my code for alzeihmers model
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.preprocessing.image import ImageDataGenerator
Posted by Ye Jia and Michelle Tadmor Ramanovich, Software Engineers, Google Research
Automatic translation of speech from one language to speech in another language, called speech-to-speech translation (S2ST), is important for breaking down the communication barriers between people speaking different languages. Conventionally, automatic S2ST systems are built with a cascade of automatic speech recognition (ASR), text-to-text machine translation (MT), and text-to-speech (TTS) synthesis sub-systems, so that the system overall is text-centric. Recently, work on S2ST that doesn’t rely on intermediate text representation is emerging, such as end-to-end direct S2ST (e.g., Translatotron) and cascade S2ST based on learned discrete representations of speech (e.g., Tjandra et al.). While early versions of such direct S2ST systems obtained lower translation quality compared to cascade S2ST models, they are gaining traction as they have the potential both to reduce translation latency and compounding errors, and to better preserve paralinguistic and non-linguistic information from the original speech, such as voice, emotion, tone, etc. However, such models usually have to be trained on datasets with paired S2ST data, but the public availability of such corpora is extremely limited.
To foster research on such a new generation of S2ST, we introduce a Common Voice-based Speech-to-Speech translation corpus, or CVSS, which includes sentence-level speech-to-speech translation pairs from 21 languages into English. Unlike existing public corpora, CVSS can be directly used for training such direct S2ST models without any extra processing. In “CVSS Corpus and Massively Multilingual Speech-to-Speech Translation”, we describe the dataset design and development, and demonstrate the effectiveness of the corpus through training of baseline direct and cascade S2ST models and showing performance of a direct S2ST model that approaches that of a cascade S2ST model.
Building CVSS CVSS is directly derived from the CoVoST 2 speech-to-text (ST) translation corpus, which is further derived from the Common Voice speech corpus. Common Voice is a massively multilingual transcribed speech corpus designed for ASR in which the speech is collected by contributors reading text content from Wikipedia and other text corpora. CoVoST 2 further provides professional text translation for the original transcript from 21 languages into English and from English into 15 languages. CVSS builds on these efforts by providing sentence-level parallel speech-to-speech translation pairs from 21 languages into English (shown in the table below).
To facilitate research with different focuses, two versions of translation speech in English are provided in CVSS, both are synthesized using state-of-the-art TTS systems, with each version providing unique value that doesn’t exist in other public S2ST corpora:
CVSS-C: All the translation speech is in a single canonical speaker’s voice. Despite being synthetic, the speech is highly natural, clean, and consistent in speaking style. These properties ease the modeling of the target speech and enable trained models to produce high quality translation speech suitable for general user-facing applications where speech quality is of higher importance than accurately reproducing the speakers’ voices.
CVSS-T: The translation speech captures the voice from the corresponding source speech. Each S2ST pair has a similar voice on the two sides, despite being in different languages. Because of this, the dataset is suitable for building models where accurate voice preservation is desired, such as for movie dubbing.
Together with the source speech, the two S2ST datasets contain 1,872 and 1,937 hours of speech, respectively.
Source Language
Code
Source speech (X)
CVSS-C target speech (En)
CVSS-T target speech (En)
French
fr
309.3
200.3
222.3
German
de
226.5
137.0
151.2
Catalan
ca
174.8
112.1
120.9
Spanish
es
157.6
94.3
100.2
Italian
it
73.9
46.5
49.2
Persian
fa
58.8
29.9
34.5
Russian
ru
38.7
26.9
27.4
Chinese
zh
26.5
20.5
22.1
Portuguese
pt
20.0
10.4
11.8
Dutch
nl
11.2
7.3
7.7
Estonian
et
9.0
7.3
7.1
Mongolian
mn
8.4
5.1
5.7
Turkish
tr
7.9
5.4
5.7
Arabic
ar
5.8
2.7
3.1
Latvian
lv
4.9
2.6
3.1
Swedish
sv
4.3
2.3
2.8
Welsh
cy
3.6
1.9
2.0
Tamil
ta
3.1
1.7
2.0
Indonesian
id
3.0
1.6
1.7
Japanese
ja
3.0
1.7
1.8
Slovenian
sl
2.9
1.6
1.9
Total
1,153.2
719.1
784.2
Amount of source and target speech of each X-En pair in CVSS (hours).
In addition to translation speech, CVSS also provides normalized translation text matching the pronunciation in the translation speech (on numbers, currencies, acronyms, etc., see data samples below, e.g., where “100%” is normalized as “one hundred percent” or “King George II” is normalized as “king george the second”), which can benefit both model training as well as standardizing the evaluation.
Le genre musical de la chanson est entièrement le disco.
CVSS-C translation audio (English)
CVSS-T translation audio (English)
Translation text (English)
The musical genre of the song is 100% Disco.
Normalized translation text (English)
the musical genre of the song is one hundred percent disco
Example 2:
Source audio (Chinese)
Source transcript (Chinese)
弗雷德里克王子,英国王室成员,为乔治二世之孙,乔治三世之幼弟。
CVSS-C translation audio (English)
CVSS-T translation audio (English)
Translation text (English)
Prince Frederick, member of British Royal Family, Grandson of King George II, brother of King George III.
Normalized translation text (English)
prince frederick member of british royal family grandson of king george the second brother of king george the third
Baseline Models On each version of CVSS, we trained a baseline cascade S2ST model as well as two baseline direct S2ST models and compared their performance. These baselines can be used for comparison in future research.
Cascade S2ST: To build strong cascade S2ST baselines, we trained an ST model on CoVoST 2, which outperforms the previous states of the art by +5.8 average BLEU on all 21 language pairs (detailed in the paper) when trained on the corpus without using extra data. This ST model is connected to the same TTS models used for constructing CVSS to compose very strong cascade S2ST baselines (ST → TTS).
Direct S2ST: We built two baseline direct S2ST models using Translatotron and Translatotron 2. When trained from scratch with CVSS, the translation quality from Translatotron 2 (8.7 BLEU) approaches that of the strong cascade S2ST baseline (10.6 BLEU). Moreover, when both use pre-training the gap decreases to only 0.7 BLEU on ASR transcribed translation. These results verify the effectiveness of using CVSS to train direct S2ST models.
Translation quality of baseline direct and cascade S2ST models built on CVSS-C, measured by BLEU on ASR transcription from speech translation. The pre-training was done on CoVoST 2 without other extra data sets.
Conclusion We have released two versions of multilingual-to-English S2ST datasets, CVSS-C and CVSS-T, each with about 1.9K hours of sentence-level parallel S2ST pairs, covering 21 source languages. The translation speech in CVSS-C is in a single canonical speaker’s voice, while the same in CVSS-T is in voices transferred from the source speech. Each of these datasets provides unique value not existing in other public S2ST corpora.
We built baseline multilingual direct S2ST models and cascade S2ST models on both datasets, which can be used for comparison in future works. To build strong cascade S2ST baselines, we trained an ST model on CoVoST 2, which outperforms the previous states of the art by +5.8 average BLEU when trained on the corpus without extra data. Nevertheless, the performance of the direct S2ST models approaches the strong cascade baselines when trained from scratch, and with only 0.7 BLEU difference on ASR transcribed translation when utilized pre-training. We hope this work helps accelerate the research on direct S2ST.
Acknowledgments We acknowledge the volunteer contributors and the organizers of the Common Voice and LibriVox projects for their contribution and collection of recordings, the creators of Common Voice, CoVoST, CoVoST 2, Librispeech and LibriTTS corpora for their previous work. The direct contributors to the CVSS corpus and the paper include Ye Jia, Michelle Tadmor Ramanovich, Quan Wang, Heiga Zen. We also thank Ankur Bapna, Yiling Huang, Jason Pelecanos, Colin Cherry, Alexis Conneau, Yonghui Wu, Hadar Shemtov and Françoise Beaufays for helpful discussions and support.
The article includes a step by step explanation of the merge sort algorithm and code snippets illustrating the implementation of the algorithm itself.
Data Scientists deal with algorithms daily. However, the data science discipline as a whole has developed into a role that does not involve implementation of sophisticated algorithms. Nonetheless, practitioners can still benefit from building an understanding and repertoire of algorithms.
In this article, the sorting algorithm merge sort is introduced, explained, evaluated, and implemented. The aim of this post is to provide you with robust background information on the merge sort algorithm, which acts as foundational knowledge for more complicated algorithms.
Although merge sort is not considered to be complex, understanding this algorithm will help you recognize what factors to consider when choosing the most efficient algorithm to perform data-related tasks. Created in 1945, John Von Neumann developed the merge sort algorithm using the divide-and-conquer approach.
Divide and conquer
To understand the merge sort algorithm, you must be familiar with the divide and conquer paradigm, alongside the programming concept of recursion. Recursion within the computer science domain is when a method defined to solve a problem involves an invocation of itself within its implementation body.
In other words, the function calls itself repeatedly.
Figure 1. Visual illustration of recursion – Image by author.
Divide and conquer algorithms (which merge sort is a type of) employ recursion within its approach to solve specific problems. Divide and conquer algorithms decompose complex problems into smaller sub-parts, where a defined solution is applied recursively to each sub-part. Each sub-part is then solved separately, and the solutions are recombined to solve the original problem.
The divide-and-conquer approach to algorithm design combines three primary elements:
Decomposition of the larger problem into smaller subproblems. (Divide)
Recursive utilization of functions to solve each of the smaller subproblems. (Conquer)
The final solution is a composition of the solution to the smaller subproblems of the larger problem. (Combine)
Other algorithms use the divide-and-conquer paradigm, such as Quicksort, Binary Search, and Strassen’s algorithm.
Merge sort
In the context of sorting elements in a list and in ascending order, the merge sort method divides the list into halves, then iterates through the new halves, continually dividing them down further to their smaller parts.
Subsequently, a comparison of smaller halves is conducted, and the results are combined together to form the final sorted list.
Steps and implementation
Implementation of the merge sort algorithm is a three-step procedure. Divide, conquer, and combine.
The divide component of the divide-and-conquer approach is the first step. This initial step separates the overall list into two smaller halves. Then, the lists are broken down further until they can no longer be divided, leaving only one element item in each halved list.
The recursive loop in merge sort’s second phase is concerned with the list’s elements being sorted in a particular order. For this scenario, the initial array is sorted in ascending order.
In the following illustration, you can see the division, comparison, and combination steps involved in the merge sort algorithm.
Figure 2. Divide component illustration of the Merge sort algorithm—Image by Author.
Figure 3. Conquer and combine components—Image by author.
To implement this yourself:
Create a function called merge_sort that accepts a list of integers as its argument. All following instructions presented are within this function.
Start by dividing the list into halves. Record the initial length of the list.
Check that the recorded length is equal to 1. If the condition evaluates to true, return the list as this means that there is just one element within the list. Therefore, there is no requirement to divide the list.
Obtain the midpoint for a list with a number of elements greater than 1. When using the Python language, the // performs division with no remainder. It rounds the division result to the nearest whole number. This is also known as floor division.
Using the midpoint as a reference point, split the list into two halves. This is the divide aspect of the divide-and-conquer algorithm paradigm.
Recursion is leveraged at this step to facilitate the division of lists into halved components. The variables ‘left_half’ and ‘right_half’ are assigned to the invocation of the ‘merge_sort’ function, accepting the two halves of the initial list as parameters.
The ‘merge_sort’ function returns the invocation of a function that merges two lists to return one combined, sorted list.
Create a ‘merge’ function that accepts two lists of integers as its arguments. This function contains the conquer and combine aspects of the divide-and-conquer algorithm paradigm. All following steps are executed within the body of this function.
Assign an empty list to the variable ‘output’ that holds the sorted integers.
The pointers ‘i’ and ‘j’ are used to index the left and right lists, respectively.
Within the while loop, there is a comparison between the elements of both the left and right lists. After each comparison, the output list is populated within the two compared elements. The pointer of the list of the appended element is incremented.
The remaining elements to be added to the sorted list are elements obtained from the current pointer value to the end of the respective list.
def merge(left, right):
output = []
i = j = 0
while (i
Performance and complexity
Big O notation is a standard for defining and organizing the performance of algorithms in terms of their space requirement and execution time.
Merge sort algorithm time complexity is the same for its best, worst, and average scenarios. For a list of size n, the expected number of steps, minimum number of steps, and maximum number of steps for the merge sort algorithm to complete, are all the same.
As noted earlier in this article, the merge sort algorithm is a three-step process: divide, conquer, and combine. The ‘divide’ step involves the computation of the midpoint of the list, which, regardless of the list size, takes a single operational step. Therefore the notation for this operation is denoted as O(1).
The ‘conquer’ step involves dividing and recursively solving subarrays–the notation log n denotes this. The ‘combine’ step consists of combining the results into a final list; this operation execution time is dependent on the list size and denoted as O(n).
The merge sort notation for its average, best, and worst time complexity is log n * n * O(1). In Big O notation, low-order terms and constants are negligible, meaning the final notation for the merge sort algorithm is O(n log n). For a detailed analysis of the merge sort algorithm, refer to this article.
Evaluation
Merge sort performs well when sorting large lists, but its operation time is slower than other sorting solutions when used on smaller lists. Another disadvantage of merge sort is that it will execute the operational steps even if the initial list is already sorted. In the use case of sorting linked lists, merge sort is one of the fastest sorting algorithms to use. Merge sort can be used in file sorting within external storage systems, such as hard drives.
Key takeaways
This article describes the merge sort technique by breaking it down in terms of its constituent operations and step-by-step processes.
Merge sort algorithm is commonly used and the intuition and implementation behind the algorithm is rather straightforward in comparison to other sorting algorithms. This article includes the implementation step of the merge sort algorithm in Python.
You should also know that the time complexity of the merge sort method’s execution time in different situations, remains the same for best, worst, and average scenarios. It is recommended that merge sort algorithm is applied in the following scenarios:
When dealing with larger sets of data, use the merge sort algorithm. Merge sort performs poorly on small arrays when compared to other sorting algorithms.
Elements within a linked list have a reference to the next element within the list. This means that within the merge sort algorithm operation, the pointers are modifiable, making the comparison and insertion of elements have a constant time and space complexity.
Have some form of certainty that the array is unsorted. Merge sort will execute its operations even on sorted arrays, a waste of computing resources.
Use merge sort when there is a consideration for the stability of data. Stable sorting involves maintaining the order of identical values within an array. When compared with the unsorted data input, the order of identical values throughout an array in a stable sort is kept in the same position in the sorted output.
I’m pretty new to ML and tensorflow. What I’m trying to do now is practically to inverse a function using tensorflow.
So I have a function h=f(c1, c2,..cn, T). It is a smooth function of all the variables. I want to train a model which would give me T given known values of c1…cn and h.
For now I’m using a keras.Sequential model with 2 or 3 dense layers.
For loss I use ‘mean_absolute_error’, For optimizer – Adam().
To train the model I generate a dataset using my h(c1…cn, T) function by varying its arguments and using values of T as train_labels.
The accuracy of the resulting model is not very good to my mind – I’m getting errors of about 10%. To my mind this is not very good, given that the training dataset is ideally smooth.
My questions are:
Am I doing something particularly wrong?
How many units should I provide for each layer? I mean in tutorials they are using either Dense(64) or Dense(1). What difference does it make in my particular case? Should it be proportional to the number of parameters of the model?
May be I should use some other types of layers/optimizers/losses?