The repository contains the code of the recent research advances in Shannon.AI.
A Unified MRC Framework for Named Entity Recognition
Xiaoya Li, Jingrong Feng, Yuxian Meng, Qinghong Han, Fei Wu and Jiwei Li
In ACL 2020. paper
If you find this repo helpful, please cite the following:
@article{li2019unified,
title={A Unified MRC Framework for Named Entity Recognition},
author={Li, Xiaoya and Feng, Jingrong and Meng, Yuxian and Han, Qinghong and Wu, Fei and Li, Jiwei},
journal={arXiv preprint arXiv:1910.11476},
year={2019}
}
For any question, please feel free to post Github issues.
pip install -r requirements.txt
We build our project on pytorch-lightning. If you want to know more about the arguments used in our training scripts, please refer to pytorch-lightning documentation.
You can download our preprocessed MRC-NER datasets or
write your own preprocess scripts. We provide ner2mrc/mrsa2mrc.py
for reference.
For English Datasets, we use BERT-Large
For Chinese Datasets, we use RoBERTa-wwm-ext-large
The main training procedure is in trainer.py
Examples to start training are in scripts/reproduce
.
Note that you may need to change DATA_DIR
, BERT_DIR
, OUTPUT_DIR
to your own
dataset path, bert model path and log path, respectively.
trainer.py
will automatically evaluate on dev set every val_check_interval
epochs,
and save the topk checkpoints to default_root_dir
.
To evaluate them, use evaluate.py