This repository contains two neural part-of-speech tagging models for English. One model is based on LSTM network and other one is based on feed-forward neural network.
Download the pre-trained word embeddings
Dataset is extracted from penn tree bank files
Python 3.6
Pytorch 1.6
- Create the vocab.pkl file
python3.5 lstm-gen-vocab.py <training_file_path> <glove_file_path> <embedding_dimension> <vocab_file_path>
- Train
CUDA_VISIBLE_DEVICES="0" python3.5 lstm-tagger.py train <vocab_file_path> <embedding_dimension> <training_file_path> <model_file_path>
- Test
CUDA_VISIBLE_DEVICES="0" python3.5 lstm-tagger.py test <vocab_file_path> <embedding_dimension> <model_file_path> <test_file_path> <output_file_path>
- Evaluation
python3.5 eval.py <output_file_path> <reference_file_path>
- Create the vocab.pkl file
python3.5 fnn-gen-vocab.py <training_file_path> <glove_file_path> <embedding_dimension> <vocab_file_path>
- Train
CUDA_VISIBLE_DEVICES="0" python3.5 fnn-tagger.py train <vocab_file_path> <embedding_dimension> <training_file_path> <model_file_path>
- Test
CUDA_VISIBLE_DEVICES="0" python3.5 fnn-tagger.py test <vocab_file_path> <embedding_dimension> <model_file_path> <test_file_path> <output_file_path>
- Evaluation
python3.5 eval.py <output_file_path> <reference_file_path>
model_file_path should have .h5py extension.
vocab_file_path should have .pkl extension