/Vanilla_NER

Vanilla Sequence Labeling w. Char-LSTM-CRF

Primary LanguagePythonApache License 2.0Apache-2.0

Vanilla NER

License

Check Our New NER Toolkit🚀🚀🚀

  • Inference:
    • LightNER: inference w. models pre-trained / trained w. any following tools, efficiently.
  • Training:
    • LD-Net: train NER models w. efficient contextualized representations.
    • VanillaNER: train vanilla NER models w. pre-trained embedding.
  • Distant Training:
    • AutoNER: train NER models w.o. line-by-line annotations and get competitive performance.

This project is drivied from LD-Net, and provides a vanilla Char-LSTM-CRF model for Named Entity Recognition (LD-Net w.o. contextualized representations).

We are in an early-release beta. Expect some adventures and rough edges. LD-Net is a more mature project, please refer to LD-Net for detailed documents and also demo scripts.

https://github.com/LiyuanLucasLiu/LD-Net

Training

Dependency

Our package is based on Python 3.6 and the following packages:

numpy
tqdm
torch-scope>=0.5.0
torch==0.4.1

Command

Please first generate the word dictionary by:

python pre_seq/gene_map.py -h

Then encode the dictionary by:

python pre_seq/encode_data.py -h

Then train the model:

python train_seq.py -h

Inference

Models trained with this package can be used to inference with the LightNER package.

Citation

If you find the implementation useful, please cite the following paper: Efficient Contextualized Representation: Language Model Pruning for Sequence Labeling

@inproceedings{liu2018efficient,
  title = "{Efficient Contextualized Representation: Language Model Pruning for Sequence Labeling}", 
  author = {Liu, Liyuan and Ren, Xiang and Shang, Jingbo and Peng, Jian and Han, Jiawei}, 
  booktitle = {EMNLP}, 
  year = 2018, 
}