ASR-SG-HMM-GMM

speech recognition of digits based on single Gaussian, Gaussian Mixture, and Hidden Markov Models.

Training and test data contain 2,464 and 2,486 utterances respectively.
Each utterance has a unique id (e.g., ac_1a, ac_1b). After the "[", you can find a 39-dimensional feature vector per line. Each line corresponds to a feature vector in a consecutive frame, with the final frame in an utterance terminated by "]".

Single-Gaussian-based ASR:

Estimate the Gaussian distribution (diagonal covariance) for each digit in the training data by using maximum likelihood estimation.
Compute the log likelihood value for each digit for each utterance in the test data by using the distributions estimated above.
Predict the most likely digit for each utterance by selecting the digit with the largest likelihood.
Compute the accuracy (# of correct digits / # of test utterances (=2486) * 100) and report the accuracy.

GMM-based ASR:

Estimate the Gaussian mixture distribution (diagonal covariance) for each digit by using maximum likelihood estimation.
Initialization: Initialize the mean and variance parameters of each mixture from those of the single-Gaussian-based speech recognition model; Each mixture mean vector should be slightly perturbed randomly according to the standard deviation; Same as 2,3,4 in Single-Gaussian-based ASR.

HMM-based ASR:

Estimate an HMM for each digit in the training data, with a single (diagonal covariance) Gaussian distribution per state, by maximum likelihood estimation.
Initialization: Use uniform alignments; Initialize the HMM parameters according to this alignment.
Use Baum-Welch algorithm or Viterbi training algorithm.
Same as 2,3,4 in Single-Gaussian-based ASR.

Command:
python submission.py --mode mode train_1digit.feat test_1digit.feat
mode can be sg, gmm and hmm
--debug can be used before --mode if needed.

yuweiwan/ASR-SG-HMM-GMM

ASR-SG-HMM-GMM