Part-of-Speech Tagger using Hidden Markov Model (HMM)
This system is a part-of-speech (POS) tagger implemented using the Hidden Markov Model (HMM) and the Viterbi algorithm.
- Replace the training and test corpus files with the files you desire to use (
train_corpus.pos
andtest_corpus.word
). (Modify the file names within therun()
function in themain.py
script) - Run the
main.py
script using Python 3. - The output will be written in a
submission.pos
file. - To see the difference between the predicted tags and the standard tags, run
python3 score.py <train_file> <test_file>
.
- The system assigns a default emission probability of
1e-6
for OOV words and recalculates the likelihoods for all words by dividing by the number of unique words in the training set.