Code to reproduce the experiments in A Simple Transfer Learning Baseline for Ellipsis Resolution
Requires Python >= 3.5.0
Recommended: Create a conda environment with conda create -n myenv python=3.7
The repository contains conversion scripts for converting different datasets into the SQuAD 1.1 format.
vpe2squad.py
: Convert VP ellipsis dataset into SQuAD formatconll2squad.py
: Convert coreference data from C0NLL-2012 to SQuAD format- First convert
.conll
files to.jsonlines
using this - Set
ONTONOTES_DIR
(ontonotes folder path) andset2fmt
(filename to convert to SQuAD format) - Run script
- First convert
sluice2squad.py
: Convert sluice ellipsis dataset into SQuAD formatwikicoref2conll.py
: Convert WikiCoref dataset into CoNLL-2012 formatsquad2conll.py
: Convert the prediction files produced bybert/run_squad.py
into CONLL format for evaluation
annotate_qwords.py
: Adds<ref>
and</ref>
tags to interrogation words in SQuAD filesevaluate-v1.1.py
: Standard SQuAD v1.1 evaluation script (for evaluating ellipsis)
For coreference resolution, use the standard CoNLL-2012 script after converting the predictions into the CoNLL-2012 format using squad2conll.py
.
Each model folder contains pre-processing, configuration, training and evaluation scripts for Sluice Ellipsis. To run on other datasets, just replace the data paths appropriately.
- Code based on Facebook's DrQA
- Scripts for preprocessing, training and prediction
- Code based on AllenNLP
- AllenNLP configuration file
- Scripts for training and prediction
- Uses Huggingface's Transformers
- Scripts for training and evaluation