Hidden Markov Model and Viterbi Algorithm: An optimization using Numba and PyPy
What if we are given a sequence of observations over time, and we are interested in discovering the reason behind such a sequence, or predicting what the next state will be? In this article, we provide simple notes on the concept of Hidden Markov Model (HMM) -- a formalism to reason about states over time that is specifically aimed at decode a series of \textit{hidden} states from a series of observations. We also go through a fast decoding algorithm for HMM, called the \textit{Viterbi} algorithm, and implement in Python with the aid of \textit{just-in-time} (JIT) compilers to further improve the speed of the said algorithm.
It is recommended to create a virtual environment for using this repository.
$ virtualenv venv --python=python3
$ source venv/bin/activate
Then, install the dependencies inside the virtual environment,
$ pip install -r requirements.txt
To run the HMM on Dishonest Casino, the following are the program parameters,
usage: benchmark.py [-h] [-e EXECUTIONS] [-o OBSERVATION] [-r GENERATE_RANDOM]
[-i INTERVAL [INTERVAL ...]] [-x EXPORT]
Benchmarking HMM for Dishonest Casino
optional arguments:
-h, --help show this help message and exit
Parameters:
-e EXECUTIONS, --executions EXECUTIONS
The number of times to run the program.
-o OBSERVATION, --observation OBSERVATION
The observed sequence.
-r GENERATE_RANDOM, --generate_random GENERATE_RANDOM
The length of generated observed sequence.
-i INTERVAL [INTERVAL ...], --interval INTERVAL [INTERVAL ...]
Range of sequence lengths, e.g. (10, 100, 10) would
produce sequence lengths of 10 to 90 with delta=10.
-x EXPORT, --export EXPORT
Export the results to a file.
For the Profile HMM, the following are the parameters
usage: benchmarkf.py [-h] [-i INTERVAL [INTERVAL ...]] [-x EXPORT]
Benchmarking Profile HMM
optional arguments:
-h, --help show this help message and exit
Parameters:
-i INTERVAL [INTERVAL ...], --interval INTERVAL [INTERVAL ...]
Range of sequence lengths, e.g. (10, 100, 10) would
produce sequence lengths of 10 to 90 with delta=10.
-x EXPORT, --export EXPORT
Export the results to a file.
In our study, we used the following arguments for benchmarking,
For HMM,
$ python3 benchmark.py --interval 10 100 10
For Profile HMM,
$ python3 benchmarkf.py --interval 1 20 1
The seed values are already set in both benchmarking modules for reproducibility of pseudorandomly generated sequences for testing.
To use our HMM implementations in a different Python program,
>>> from hmm.hmm_jhu import HMM
>>> hmm = HMM(transition_matrix: dict, emmision_matrx: dict, initial_probabilities: dict)
>>> hmm.viterbi(observation: str)