How to replicate python_speech_features.mfcc()

Question

How to replicate python_speech_features.mfcc()

PrabakarSundar opened this issue 4 years ago · 1 comments

I'm trying to do direct MFCC conversion from audio to reduce preprocessing overhead. I need to replicate the python speech extractor based MFCC from here(https://python-speech-features.readthedocs.io/en/latest/). I tried to replicate but the output dimension from Kapre is different from python_speech_extractor.mfcc().
Using kapre:

sample_rate = 16000
 clip_duration = 2
 data_size = int(sample_rate*clip_duration)
 input shape = (data_size,1)
 model = Sequential()
 melgram_layer = get_melspectrogram_layer(input_shape=input_shape,n_fft=512,win_length =int(data_size*0.025),\
                                                hop_length = int(data_size*0.01), return_decibel=True)
 model.add(melgram_layer)
 model.add(LogmelToMFCC(n_mfccs=13))
 model.summary()

which returns the shape(None, 98, 13, 1) while the python speech extractor gives (None, 199, 13).
I want to pass the resultant data to a Conv1D, which doesn't support the input dimension. Please help. Thanks in advance.

Answer 1 · 2020-10-27T09:11:00.000Z

you are getting a shorter result because your hop length is too large. i think you meant to pass hop_length=0.01 * sample_rate. then you’d get 197. still shorter than 199 but that’s probably because of a lack of padding in Kapre’s melspectrogram layer by default. pass pad_begin=true and pad_end=true. then you’ll get 199 i think.