How to replicate python_speech_features.mfcc()
PrabakarSundar opened this issue · 1 comments
I'm trying to do direct MFCC conversion from audio to reduce preprocessing overhead. I need to replicate the python speech extractor based MFCC from here(https://python-speech-features.readthedocs.io/en/latest/). I tried to replicate but the output dimension from Kapre is different from python_speech_extractor.mfcc().
Using kapre:
sample_rate = 16000
clip_duration = 2
data_size = int(sample_rate*clip_duration)
input shape = (data_size,1)
model = Sequential()
melgram_layer = get_melspectrogram_layer(input_shape=input_shape,n_fft=512,win_length =int(data_size*0.025),\
hop_length = int(data_size*0.01), return_decibel=True)
model.add(melgram_layer)
model.add(LogmelToMFCC(n_mfccs=13))
model.summary()
which returns the shape(None, 98, 13, 1) while the python speech extractor gives (None, 199, 13).
I want to pass the resultant data to a Conv1D, which doesn't support the input dimension. Please help. Thanks in advance.
you are getting a shorter result because your hop length is too large. i think you meant to pass hop_length=0.01 * sample_rate. then you’d get 197. still shorter than 199 but that’s probably because of a lack of padding in Kapre’s melspectrogram layer by default. pass pad_begin=true and pad_end=true. then you’ll get 199 i think.