PyTorch implementation for Korean morphological analyzer
- PyTorch >= 1.1
- torchtext
- Check the requirements.txt
pip install -r requirements.txt
The sample data can be found data/ directory. The data consists of eojoel and pairs of morphmeme and POS tag.
python train.py
This will load a config file (config/kma.yaml) and run the model defined by the config file, which consists of a 3-layer LSTM with 100 hidden units on the bidirectional encoder and a Pointer-generator network and a CRF tagger. The detailed parameters can be found config/ directory.
python tagging.py --input_file text_file --output output_file
We have a model which you can use to tag on new data. It reads sentences line by line and executes the tagging. The tagged outputs are saved into output_file.
- Pretrained models can be downloaded download
@inproceedings{song-park-2019-korean,
title="{K}orean Morphological Analysis with Tied Sequence-to-Sequence Multi-Task Model",
author="Song, Hyun-Je and Park, Seong-Bae",
booktitle="Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)",
pages="1436--1441",
year="2019"
}
The implementation is highly inspired from IBM's seq2seq and OpenNMT-py.