A Pure AllenNLP Implementation of FEVER Baseline Models

This is an implenetation of the FEVER Baseline system. This is a heterogeneous system with two components: the for a short sentence (called a claim) this system firstly finds evidence (information rertrieval) and then performs an assessment whether the claim is Supported or Refuted given the evidence (natual language inference).

Model Description

The system contains three components: an end-to-end evidence retrieval system, the NLI classifier and a training data sampling script to generate new instances to train the NLI model.

Information Retrieval

The information retrieval system uses Facebook's DrQA implementation for TF-IDF based document similarity. The DrQA script runs with two phases: firstly using the Wikipedia index, it will select the k=5 closest documents that are most similar to the claim. Each of the sentences in those documents is used to then construct a new index over the sentences to find l=5 nearest sentences to the claim.

Natural Language Inference

We have three models for NLI: Decomposable Attention, ESIM and ESIM with ELMO embeddings. These models concatenate the evidence sentences retrieved from the IR phase and predict a label in Supported, Refuted, or NotEnoughInfo.

Evidence Sampling for Training NLI

For claims that are NotEnoughInfo there is no evidence labeled in the dataset. The evidence sampling script identifies the closest relevant document for NotEnoughInfo claims then samples a sentence uniformly at random to train the NLI classifier.

Install

This model can be installed with either pip or docker. For more info about the docker image, see this repo: )[https://github.com/j6mes/fever2-sample])

PIP install

Create and activate fever conda environment

conda create -n fever
source activate fever

Install dependencies

pip install -r requirements.txt

Manual data install

If using the docker verison of this repo. The data will be mounted in a data folder. Otherwise it must be manually set up with the following scripts:

Download GloVe

mkdir -p data
wget http://nlp.stanford.edu/data/wordvecs/glove.6B.zip
unzip glove.6B.zip -d data/glove
gzip data/glove/*.txt

Download Wiki

mkdir -p data
mkdir -p data/index
mkdir -p data/fever

wget -O data/index/fever-tfidf-ngram=2-hash=16777216-tokenizer=simple.npz https://s3-eu-west-1.amazonaws.com/fever.public/wiki_index/fever-tfidf-ngram%3D2-hash%3D16777216-tokenizer%3Dsimple.npz
wget -O data/fever/fever.db https://s3-eu-west-1.amazonaws.com/fever.public/wiki_index/fever.db

Download Data

mkdir -p data
mkdir -p data/fever-data
wget -O data/fever-data/train.jsonl https://s3-eu-west-1.amazonaws.com/fever.public/train.jsonl
wget -O data/fever-data/dev.jsonl https://s3-eu-west-1.amazonaws.com/fever.public/shared_task_dev.jsonl
wget -O data/fever-data/test.jsonl https://s3-eu-west-1.amazonaws.com/fever.public/shared_task_test.jsonl

Running with Pretrained Models

The pretrained models can be used with the following scripts:

Information Retreival

Find the 5 nearest sentences from the 5 nearest documents using the pre-computed TF-IDF index. The documents are in the database file fever.db.

export PYTHONPATH=src
export FEVER_ROOT=$(pwd)
mkdir -p work
export WORK_DIR=work
python -m fever_ir.evidence.retrieve \
    --database $FEVER_ROOT/data/fever/fever.db \
    --index $FEVER_ROOT/data/index/fever-tfidf-ngram\=2-hash\=16777216-tokenizer\=simple.npz \
    --in-file $FEVER_ROOT/data/fever-data/dev.jsonl \
    --out-file $WORK_DIR/dev.sentences.p5.s5.jsonl \
    --max-page 5 \
    --max-sent 5

Natural Language Inference

There are two available model files: https://jamesthorne.co.uk/content/files/fever-esim.tar.gz and https://jamesthorne.co.uk/content/files/fever-da.tar.gz. If you are using a pretrained model, change $MODEL_FILE and $MODEL_NAME appropriately.

export CUDA_DEVICE=-1 #Set this to appropriate value if using a GPU. -1 for CPU
export PYTHONPATH=src
export FEVER_ROOT=$(pwd)
mkdir -p work
export WORK_DIR=work
export MODEL_NAME=fever-esim-elmo
export MODEL_FILE=https://jamesthorne.co.uk/content/files/$MODEL_NAME.tar.gz
python -m allennlp.run predict \
    --output-file $WORK_DIR/$MODEL_NAME.predictions.jsonl \
    --cuda-device $CUDA_DEVICE \
    --include-package fever.reader \
    $MODEL_FILE \
    $WORK_DIR/dev.sentences.p5.s5.jsonl

Scoring

Dev set can be scored locally with the FEVER scorer

python -m fever_ir.submission.score \
    --predicted_labels $WORK_DIR/$MODEL_NAME.predictions.jsonl \
    --predicted_evidence $WORK_DIR/dev.sentences.p5.s5.jsonl

Test set can be uploaded to the scoring server for scoring - the submission can be prepared with the following script

python -m fever_ir.submission.prepare \
    --predicted_labels $WORK_DIR/$MODEL_NAME.predictions.jsonl \
    --predicted_evidence $WORK_DIR/test.sentences.p5.s5.jsonl
    --out_file $WORK_DIR/submission.jsonl

Train new Models

Sample Evidence for Training

export PYTHONPATH=src
export FEVER_ROOT=$(pwd)
mkdir -p work
export WORK_DIR=work
python -m fever_ir.evidence.retrieve \
    --index $FEVER_ROOT/data/index/fever-tfidf-ngram\=2-hash\=16777216-tokenizer\=simple.npz \
    --in-file $FEVER_ROOT/data/fever-data/dev.jsonl \
    --out-file $FEVER_ROOT/data/fever/train.ns.pages.p1

Train Models

Decomposable Attention Model

allennlp train configs/decomposable_attention.json -s log/fever_da --include-package fever

ESIM

allennlp train configs/esim_elmo.json -s log/fever_esim --include-package fever

ESIM+ELMo

allennlp train configs/esim_elmo.json -s log/fever_esim_elmo --include-package fever

j6mes/fever-allennlp

A Pure AllenNLP Implementation of FEVER Baseline Models

Model Description

Information Retrieval

Natural Language Inference

Evidence Sampling for Training NLI

Install

PIP install

Manual data install

Running with Pretrained Models

Information Retreival

Natural Language Inference

Scoring

Train new Models

Sample Evidence for Training

Train Models