How do I perform audio feature extraction on my own data set?

Question

How do I perform audio feature extraction on my own data set?

zhaoyun-ai opened this issue 3 years ago · 4 comments

Good job. I want to test on my own data set, but I don't know how to implement it. Can you give me an example with wav file as input?

Answer 1 · 2021-12-01T14:31:59.000Z

Hello! This repository actually has a decent example of how to utilize LEAF within leaf-audio-pytorch/tests/test_cross.py. Inside there, you can see we initialize a fake batch of audio samples with test_audio = np.random.random((5,8000,1)).astype(np.float32) Here we have a batch of 5 "audio" samples of 8000 length (with one channel). As long as your input data follow this shape it should work.

To actually load real audio you can use something like librosa.load(your_file.wav), create a batch with multiple samples, and add that channel dimension at the end.

Answer 2 · 2021-12-01T14:36:42.000Z

I see. Thank you for your quick reply.

Answer 3 · 2021-12-01T15:12:50.000Z

I have another question. Can this program extract features from only one wav file？I see a similar usage in the tensorflow version:

import leaf_audio.frontend as frontend
import tensorflow as tf
import tensorflow_datasets as tfds

leaf = frontend.Leaf()
filename = 'clip_3626_1989_2283.wav'
raw_audio = tf.io.read_file(filename)
waveform = tf.audio.decode_wav(raw_audio, desired_channels=1, desired_samples=16000)
waveform = tf.transpose(waveform.audio)
leaf_representation = leaf(waveform)

Answer 4 · 2021-12-02T06:51:30.000Z

I seem to have found a way：
if name == "main":
py_leaf = torch_frontend.Leaf().cuda()
file_path = "E:/data/fma_large/000/000002.mp3"
data = librosa.core.load(file_path, sr=16000)[0]
print(data.shape)
y = np.expand_dims(data, axis=0)
test_audio = np.expand_dims(y, axis=2)
# (batch_size, num_samples, 1)
#test_audio = np.random.random((8,15000,1)).astype(np.float32)
# convert to channel first for pytorch
t_audio = torch.Tensor(test_audio).permute(0,2,1).cuda()
#print(t_audio)
print(py_leaf(t_audio))