Question about length of mfcc output array
pkolb opened this issue · 2 comments
pkolb commented
I'm a little confused with the length of the mfcc output array. The following code
from python_speech_features import mfcc
import scipy.io.wavfile as wav
(rate,sig) = wav.read('test.wav')
mfcc_feat = mfcc(sig,rate)
print("rate="+str(rate))
print("sig.size="+str(sig.size))
print("mfcc_feat.shape="+str(mfcc_feat.shape))
produces:
rate=16000
sig.size=1760
mfcc_feat.shape=(10, 13)
I was expecting a shape of (11, 13), since the audio length is 110ms (160 frames per 10ms), which should result in 11 steps with 10ms each, or shouldn't it?
(If I append some more frames I'll get 11 steps starting from 1841 frames, while sig.size=1840 still gives 10 steps.)
jameslyons commented
The frame length is 20ms, so a 20ms signal will be 1 frame. 30ms will be 2
frames (because the shift is 10ms). 40ms will be 3 frames etc. Up to 110ms
being 10 frames. If you draw out the frames and the overlaps on some paper
it should make sense. Hope this helps!
…On Sat, 2 Jun 2018, 6:09 PM pkolb ***@***.***> wrote:
I'm a little confused with the length of the mfcc output array. The
following code
from python_speech_features import mfcc
import scipy.io.wavfile as wav
(rate,sig) = wav.read('test.wav')
mfcc_feat = mfcc(sig,rate)
print("rate="+str(rate))
print("sig.size="+str(sig.size))
print("mfcc_feat.shape="+str(mfcc_feat.shape))
produces:
rate=16000
sig.size=1760
mfcc_feat.shape=(10, 13)
I was expecting a shape of (11, 13), since the audio length is 110ms (160
frames per 10ms), which should result in 11 steps with 10ms each, or
shouldn't it?
(If I append some more frames I'll get 11 steps starting from 1841 frames,
while sig.size=1840 still gives 10 steps.)
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#66>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABn1QTeUbLQ1IoLIZBUnmStqc298qj9gks5t4khTgaJpZM4UXmV5>
.
pkolb commented
Yes, thanks for the explanation!