/tag-based-multi-span-extraction

The code repository for the paper "Tag-based Multi-Span Extraction in Reading Comprehension"

Primary LanguagePython

Tag-based Multi-Span Extraction in Reading Comprehension

This is the official code repository for "Tag-based Multi-Span Extraction in Reading Comprehension" (preprint) by Avia Efrat*, Elad Segal* and Mor Shoham*.
NABERT+ (raylin1000/drop-bert) by Kinley and Lin was used as a basis for our work.

This work was done as a final project for the spring 2019 instances of "Advanced Methods in Natural Language Processing" and "Advanced Methods in Machine Learning" at Tel Aviv University.

*Equal Contribution

DROP Explorer

Use DROP Explorer to better familiarize yourself with DROP and the models' predictions.

Usage

The commands listed in this section need to be run from the root directory of the repository.

First, install prerequisites with
pip install -r requirements.txt

Commands

  • Train a model:
    allennlp train configs/[config file] -s [training_directory] --include-package src

  • Output predictions by a model:
    allennlp predict model.tar.gz data/drop_dataset_dev.json --predictor machine-comprehension --cuda-device 0 --output-file predictions.jsonl --use-dataset-reader --include-package src

  • Evaluate a model (unofficial evaluation code, fast):
    allennlp evaluate model.tar.gz data/drop_dataset_dev.json --cuda-device 0 --output-file eval.json --include-package src

  • Evaluate a model (official evaluation code, slow):

    1. python generate_submission_predictions.py --archive_file model.tar.gz --input_file data/drop_dataset_dev.json --cuda-device 0 --output_file predictions.json --include-package src
    2. python -m allennlp.tools.drop_eval --gold_path data/drop_dataset_dev.json --prediction_path predictions.json --output_path metrics.json