How do I perform audio feature extraction on my own data set?
zhaoyun-ai opened this issue · 4 comments
Good job. I want to test on my own data set, but I don't know how to implement it. Can you give me an example with wav file as input?
Hello! This repository actually has a decent example of how to utilize LEAF within leaf-audio-pytorch/tests/test_cross.py
. Inside there, you can see we initialize a fake batch of audio samples with test_audio = np.random.random((5,8000,1)).astype(np.float32)
Here we have a batch of 5 "audio" samples of 8000 length (with one channel). As long as your input data follow this shape it should work.
To actually load real audio you can use something like librosa.load(your_file.wav)
, create a batch with multiple samples, and add that channel dimension at the end.
I see. Thank you for your quick reply.
I have another question. Can this program extract features from only one wav file?I see a similar usage in the tensorflow version:
import leaf_audio.frontend as frontend
import tensorflow as tf
import tensorflow_datasets as tfds
leaf = frontend.Leaf()
filename = 'clip_3626_1989_2283.wav'
raw_audio = tf.io.read_file(filename)
waveform = tf.audio.decode_wav(raw_audio, desired_channels=1, desired_samples=16000)
waveform = tf.transpose(waveform.audio)
leaf_representation = leaf(waveform)
I seem to have found a way:
if name == "main":
py_leaf = torch_frontend.Leaf().cuda()
file_path = "E:/data/fma_large/000/000002.mp3"
data = librosa.core.load(file_path, sr=16000)[0]
print(data.shape)
y = np.expand_dims(data, axis=0)
test_audio = np.expand_dims(y, axis=2)
# (batch_size, num_samples, 1)
#test_audio = np.random.random((8,15000,1)).astype(np.float32)
# convert to channel first for pytorch
t_audio = torch.Tensor(test_audio).permute(0,2,1).cuda()
#print(t_audio)
print(py_leaf(t_audio))