/hmm_tagger

NLP project, a name entity tagger written in Java using Viterbi algorithm.

Primary LanguageJava

Name Entity (Hidden Markov Model) Tagger

This tagger is written in Java using Viterbi algorithm. The evaluation script is written in Python and is provided (along with the initial data) by Professor Michael Collins in Natural Language Processing (COMS 4705) at Columbia University.

Steps to run

The tests are all in the main of each class. However, sometimes you need to run them line by line (i.e., uncomment only a line each time, keep other lines commented). I have written all instructions in the code. They are very detailed.

To compile, just type javac classname.java; to run, type java classname.

###Detailed Steps to Test the Program Please compile and run the file in this order: BaselineTagger → HMMTagger → Viterbi → Viterbi2. For BaselineTagger and HMMTagger, please uncomment one line at a time. When uncomment a line, keep other lines commented. Also, the comment in the code will ask you to run count_freq.py to generate new files, please do so, and use the suggested name (you can use other names, but then you need to dive into the code and modify it).

Qingxiang Jia

Sept. 30, 2014