MFCC Automatic Speech Recognition Algorithm Implementation

A Python 2.7 implementation of Mel Frequency Cepstral Coefficients (MFCC) and Dynamic Time Warping (DTW) algorithms for Automated Speech Recognition (ASR).

Method

Read audio data and sampling frequency from .wav file
Frame signal
Apply window function to frame (default=hamming)
Calculate DFT of frame
Calculate periodogram power spectral density estimate for each DFT bin
Apply Mel-Frequency filterbank to signal
Sum energies within each filter and take the base 10 logarithm
Take DCT of each filter
Keep coefficients [1:13]
Compute DTW best path and euclidean distance of reference vector and input vector

To-do

Noise gate
Pre-emphasis / Lifter
Feature vector database
Audio record / playback (audio.py)
Multithread MFCC extraction
Create MFCC extractor as class?

amitchone/ASR

MFCC Automatic Speech Recognition Algorithm Implementation

Method

To-do