/Bi-gram-HMM-Based-English-POS-Tagger

This is the python implementation of Bi-gram Hidden Markov Model based English part-of-speech tagger.

Primary LanguagePython

Bigram HMM based Part-of-Speech Tagger

  1. Copy the 'Build-POS-Tagger.py' and 'Run-POS-Tagger.py' along with 'sents.train', 'sents.devt' file and blind test file (say 'sents.test') in the same directory

  2. Open Ubuntu terminal and change the current working directory to above directory

  3. Run following command from Ubuntu terminal to generate the POS tagger model file python Build-POS-Tagger.py sents.train sents.devt POS-Tagger.model 1)sents.train : training file for the POS tagger 2)sents.devt : development data for POS tagger tuning 3)POS-Tagger.model : model file generated by the system

  4. Run following command from Ubuntu terminal to generate POS tagged file 'sents.out' for blind test file 'sents.test' python Run-POS-Tagger.py data.test POS-Tagger.model data.out 1)data.test : blind test file on which POS tagger will be evaluated 2)POS-Tagger.model : model file generated from training should be given as input 3)data.out : the output tagged data

  5. Measure accuracy of 'sents.out' with respect to some reference tagged file python MeasurePOSTaggerAccuracy.py data.out data.answer 1)data.out : the output tagged data 2)data.answer : the reference file for blind test data