/hmm-tagger

Hidden Markov Model POS Tagger

Primary LanguagePython

hmm-tagger

This is a Part of Speech tagger written in Python, utilizing the Viterbi algorithm (an instantiation of Hidden Markov Models). It uses the Natural Language Toolkit and trains on Penn Treebank-tagged text files. It will use ten-fold cross validation to generate accuracy statistics, comparing its tagged sentences with the gold standard.

Usage

python hmm-tagger.py [--clean]

Pass in the --clean option to clean a Treebank file before running the tagger. This can be time consuming, so you can leave it off during future runs.