/ecoz2

Linear Predictive Coding Vector Quantization and Hidden Markov Modeling for speech recognition

Primary LanguageCMIT LicenseMIT

ECOZ2

Linear Predictive Coding Vector Quantization and Hidden Markov Modeling for speech recognition.

This project is a general revision of an implementation I did as part of my BS thesis1 on isolated word speech recognition using LPC vector quantization2 and HMM3 4. The implementation was directly based on the algorithms described in the literature.

With the implementation essentially remaining as originally written, this revision mainly involves style adjustments and modernization (e.g., better alignment with C99), parallelization, use of scaling factors5 6 in HMM operations (such that the system can deal with larger models, longer observation sequences, and larger vocabulary sizes), and facilitating the creation wrappers (Python, Rust).

Status

The programs are already usable. Some pending implementation aspects include some better automation regarding model tuning, and perhaps use of a more flexible and interoperable file format for various generated artifacts (predictor files, quantized observation sequences, codebooks and HMM models). Also, note that large scale model training and tuning have not been considered at all.

Documentation? Sorry, There's no documentation (yet). But you can take a look at how the programs are used in some exercises under https://github.com/ecoz2/.

References. Some key literature references are already mentioned here and in some parts of the code, but I haven't really addressed this aspect in any systematic way, so I'm likely missing some! (This would be a straightforward task if I had a copy of my BS thesis at hand, but, alas, I don't at the moment!)

Building the programs

This is a pretty straightforward Makefile-based project with no dependencies other than GNU GCC and standard C libraries. Under a Unix-like environment just type:

$ make

On a MacOS, you may need something like CC=gcc-10 make to indicate use of GNU GCC, which can be installed via brew install gcc.

The programs will be placed under _out/bin/.

The main programs are:

 lpc            - Performs LPC on wav files.
 vq.learn       - Trains codebooks for vector quantization
 vq.quantize    - Generates observation sequences
 hmm.learn      - Trains HMM model
 hmm.classify   - Performs HMM based classification of observation sequences
 vq.classify    - Performs VQ based classification of predictor files

Other utilities:

 sgn.show       - Displays info about wav file
 sgn.endp       - Performs endpoint detection
 prd.show       - Displays info about generated predictor files
 vq.show        - Displays info about generated codebook
 hmm.show       - Displays info about generated HMM
 seq.show       - Displays info about observation and state sequences

The programs will print a basic usage message when called with no arguments.

There's no program installation per se, but you can simply include _out/bin in your $PATH:

$ export PATH=`pwd`/_out/bin:$PATH

I have added some plotting utilities. These are python based requiring pandas and matplotlib:

$ pip install pandas matplotlib 

You may want to add src/py to your $PATH as well:

$ export PATH=`pwd`/src/py:$PATH

Codebook related:

cb.plot_evaluation.py        - Distortion, σ-ratio, inertia
cb.plot_cards_dists.py       - Cell cardinality and distortion plots
cb.plot_reflections.py       - Reflection coefficient scatter plot
                               to visualize clusters and centroids

Footnotes

  1. Rueda, C.A., "Implementation and Experimentation with Speech Recognition –Isolated Digits– Using LPC Vector Quantization and Hidden Markov Models," BS. Thesis, Systems Engineering Dept., Universidad Autónoma de Manizales, Colombia, 1993.

  2. Juang, B-H., Wong, D.Y., Gray, A.H., "Distortion Performance of Vector Quantization for LPC Voice Coding," IEEE Trans. ASSP, Vol. 30, No. 2, April, 1982.

  3. Rabiner, L. R., Levinson, S.D., Sondhi, M.M., "On the Application of Vector Quantization and Hidden Markov Models to Speaker-Independent, Isolated Word Recognition," The Bell System Technical Journal, Vol. 62, No.4, April 1983.

  4. Rabiner, L. R., "A tutorial on hidden Markov models and selected applications in speech recognition," Proceedings of the IEEE, Vol. 77, No. 2, 1989.

  5. Shen, D., "Some Mathematics for HMM," http://courses.media.mit.edu/2010fall/mas622j/ProblemSets/ps4/tutorial.pdf

  6. Stamp, M., "A Revealing Introduction to Hidden Markov Models," Computer Science Dept, San Jose State University, https://www.cs.sjsu.edu/~stamp/RUA/HMM.pdf