Build a multiclass classifier to predict the POS tag of each word. Use a sliding window, i.e. represent each word by the concatenation of the embeddings of 5 words, i.e. the word, two preceding and two following words. Add a special padding word to the embeddings to represent words close to the beginning or end of a sentence.
Use all but the last 1000 sentences to train the classifier. (It should be better to choose them randomly).
Vocabularies of words and tags.
We will use a NN classifier for POS tagging.
In order to provide some context for the classifer, we represent each token as an n-gram of 5 tokens, considering 2 tokens on the left and 2 on the right.
The NN classifier will produce a probability distribution for each tag.
The predicted output will be the tag with the highest probability.
The output of the classifier should be a vector of size as the number of possible tags.
We can obtain this using function to_categorical
, to turn tag indices into a one-hot representation.
-
Build the model
-
Train the model
-
Evaluate the tagger
predict_classes
returns theargmax
of the predictedsoftmax
scores. -
Show the confusion matrix
Instead of creating the ngrams explicitly, one could use a Conv1D
layer to group tokens.
- Build the model
The MaxEnt classifiers relies on hand crafted feature representations for the input words.
The features we use:
- morphological aspects of English words, e.g. words ending in 'ed', 'ing', or 'ly'
- capitalization, not at the beginning of a sentence
- presence of the word in a list of closed POS categories, e.g. 'CONJ', 'DET', 'PRON', 'PRT
- aspects of the previos word: e.g capitalization, particle
Until a few years ago, the SoTA POS taggers were using a rich feature set. See for example:
Giménez, J., and Márquez, L. 2004. SVMTool: A general POS tagger generator based on Support Vector Machines. Proceedings of LREC'04. Lisbon, Portugal.
Warning: training is quite slow.