How do I use the audio embeddings from Google Audioset for audio classification?

I have extracted audio embeddings from Google Audioset corpus.
Now, I want to use these audio embeddings for training my own model (CNN). I have some confusion about these audio embeddings.

Should I extract STFT and MFCC from the audio embeddings? If so, how can I do that (any way to use librosa?)? Or, are the audio embeddings already transformed to MFCC?
What should be the best way to split the audio set corpus into train, test and validate datasets? They are if Tfrecord format and each tfrecord file contain various segment of audio clips having different class labels.
If I want to work on selective class labels (such as rowing, or car sound), what should be the best way to extract the selective audio segments?

Also, please share some helpful resources about working with Google audioset corpus if possible.

submitted by /u/sab_1120
[visit reddit] [comments]

Leave a Reply Cancel reply