This repo contains the code and data for the paper CorefQA: Coreference Resolution as Query-based Span Prediction.
- Install python requirements:
pip install -r requirements.txt
- Prepare training data:
python prepare_training_data.py
- Fine-tuning the hyper-parameters in
experiments.conf
- Download the Ontonotes 5.0 dataset.
- Download the SpanBERT pre-trained model.
- Run
./setup_training.sh <ontonotes/path/ontonotes-release-5.0> $data_dir
for data preparation. - Training the model.
GPU=0 python train.py <experiment>
,Results are saved in thelog_root
directory. You can see the training details withTensorBoard
.
Download the pre-trained CorefQA
model using the following command. If you want to train the CorefQA model, you can skip this step.
./download_pretrained.sh <model_name>
(e.g,: spanbert_base, spanbert_large) Will download the fine-tuned version of CorefQA
. You can use it with evaluate.py
and predict.py
。
Run GPU=0 python evaluate.py <experiment>
to evaluate the model. You can set eval_path
and conll_eval_path
in experiments.conf
to choose the evaluation files:
Model | F1 (%) |
---|---|
CorefQA + SpanBERT-base | 79.9 |
CorefQA + SpanBERT-large | 83.1 |
- Save the text for prediction in a txt file. If the text contains speaker name information, wrap the speaker with
<speaker></speaker>
and put it in front of its utterence. For example:
<speaker> Host </speaker> A traveling reporter now on leave and joins us to tell her story. Thank you for coming in to share this with us.
- run
GPU=0 python predict.py <experiment> <input_file> <output_file>
will save the prediction results in<output_file>
, The prediction for each instance is a list of clusters,each cluster is a list of mentions. Each mention is (text, (span_start, span_end)). For example:
[[('A traveling reporter', (26, 46)), ('her', (81, 84)), ('you', (98, 101))]]
If you think our paper is interesting, please cite Coreference Resolution as Query-based Span Prediction.
@article{wu2019coreference,
title={Coreference Resolution as Query-based Span Prediction},
author={Wu, Wei and Wang, Fei and Yuan, Arianna and Wu, Fei and Li, Jiwei},
journal={arXiv preprint arXiv:1911.01746},
year={2019}
}
We borrow some code from https://github.com/mandarjoshi90/coref
,Thanks to them!