Implementation of GMM-HMM for speech Recognition using hmmlearn python package

Idea is to generate model which could recognize single words from short speech segments. I use GMM HMM for model.

This is medium article which explaines what and how.

Part of code is from https://github.com/jayaram1125/Single-Word-Speech-Recognition-using-GMM-HMM- I've refactored code and added some more features:

added MFCC delta and delta-delta features to increase accuracy of the model
script to record test audio to test your model(s)
trained model on original data from original repository but also took bunch of data from Speech Command Dataset
just for testing aligned Speech Command Dataset to gain higher accuracy

My trained models accuracy information is in models/accuracies directory. Original models are not included as they are too big. Only example fruit names model is in models [directory](https://github.com/RRisto/single_word_asr_gmm_hmm/tree/master/models. If you want to use them see example predict_google.py. You can record your own voice using record_test_audio.py

Script is tested on windows 10 using python 3.7.

Training Google Speech Commands Dataset model (original)

Download speech data (like Speech Command Dataset). Data should be in folders, each folder should have a name of the label/command/word spoken in particular directory
Prepare data for training and testing using notebook This should be similar to original suggestions how to make data for training and testing. Note that testing and validation file lists are in [data/]https://github.com/RRisto/single_word_asr_gmm_hmm/tree/master/data folder
Train model using train_hmm_google_orig.py or other train scripts as a template
Predict on test data using predict_google_orig.py script
Test your model using microphone by running script listen_mic_predict.py

Another script uses data from Google Speech Commands Datasets but has only few categories for quicker training (it doesn't have unknown word and noise category)

Training very small fruit names dataset

Original data, good for debugging, not very useful for real-life speech recognition.

unzip data file
Train model using train_hmm_fruits.py or other train scripts as a template
Test your model using microphone by running script listen_mic_predict.py as template

Aligning

This is just experiment I made. Original alignment was very good but this might improve model performance.

If you wan to align data and use it for training:

Download Speech Command Dataset
Run 1.0_prep_data4aligning.ipynb
Download/install Montreal Forced Aligner
Download LibriSpeech lexicon (you can create your own also)
Run aligner using following template (in command line): bin/mfa_train_and_align /path/to/dataset_prepared_in_first_step /path/to/librispeech/lexicon.txt /path/to/aligned/dataset This part takes few hours (in usual Windows laptop)
Run 1.1_generate_aligned_audio_files_risto.ipynb - this will create chunks from original audio which contain only part where command was said
Train new model example is in train_hmm_google_aligned.py

Run docker

There is also Docker image. To use it:

build image (run build_docker.bat)
run container (run run_docker.bat)

if you wan to use jupyter notebook:

  - go inside docker container: docker exec -it single_word_gmmhmm_run /bin/bash
  - start jupyter notebook server jupyter notebook --ip=0.0.0.0 --port=8888 --allow-root
  - go to your browser and copy: http://127.0.0.1:7006/
  - from terminal you should see notebook token, copy-paste it to browser and you should be inside jupyter notebook