Check Our New NER Toolkit🚀🚀🚀
- Inference:
- LightNER: inference w. models pre-trained / trained w. any following tools, efficiently.
- Training:
- LD-Net: train NER models w. efficient contextualized representations.
- VanillaNER: train vanilla NER models w. pre-trained embedding.
- Distant Training:
- AutoNER: train NER models w.o. line-by-line annotations and get competitive performance.
This project is drivied from LD-Net, and provides a vanilla Char-LSTM-CRF model for Named Entity Recognition (LD-Net w.o. contextualized representations).
We are in an early-release beta. Expect some adventures and rough edges. LD-Net is a more mature project, please refer to LD-Net for detailed documents and also demo scripts.
https://github.com/LiyuanLucasLiu/LD-Net
Our package is based on Python 3.6 and the following packages:
numpy
tqdm
torch-scope>=0.5.0
torch==0.4.1
Please first generate the word dictionary by:
python pre_seq/gene_map.py -h
Then encode the dictionary by:
python pre_seq/encode_data.py -h
Then train the model:
python train_seq.py -h
Models trained with this package can be used to inference with the LightNER package.
If you find the implementation useful, please cite the following paper: Efficient Contextualized Representation: Language Model Pruning for Sequence Labeling
@inproceedings{liu2018efficient,
title = "{Efficient Contextualized Representation: Language Model Pruning for Sequence Labeling}",
author = {Liu, Liyuan and Ren, Xiang and Shang, Jingbo and Peng, Jian and Han, Jiawei},
booktitle = {EMNLP},
year = 2018,
}