/coqa-bert-baselines

BERT baselines for extractive question answering on coqa (https://stanfordnlp.github.io/coqa/)

Primary LanguagePython

coqa-bert-baselines

BERT baselines for extractive question answering on coqa (https://stanfordnlp.github.io/coqa/). The original paper for the coqa dataset can be found here. We provide the following models -

Except SpanBERT all pretrained models are provided by huggingface. The SpanBERT model is provided by facebookresearch.

This repo builds upon the original code provided with the paper which can be found here.

Dataset

The dataset can be downloaded from here. The dataset needs to be preprocessed to obtain 2 files - coqa.train.json and coqa.dev.json. You can either follow the steps provided in the original repo for preprocessing

Download the preprocessed files directly from here.

Requirements

torch : can be installed from here. This code was tested with torch 0.3.0 and cuda 9.2.

transformers: can be installed from here.

textacy

Usage

To run the models use the following command -

Create folder structure output\outputXXXXX , where XXXXX denotes the size of the dataset add to num_history. Edit utils\data_utils.py to control the amount of data being loaded for training.

python main.py --arguments

The arguments are as follows :

Arguments Description
trainset Path to the training file.
devset Path to the dev file.
model_name Name of the pretrained model to train (BERT,RoBERTa,DistilBERT,SpanBERT)
model_path If the model has been downloaded already, you can specify the path here. If left none, the code will automatically download the pretrained models and run.
save_state_dir The state of the program is regularly stored in this folder. This is useful incase training stops abruptly in the middle, it will automatically restart training from where it stopped
pretrained_dir The path from which to restore the entire state of the program. This path should be the name of the same folder which you would have specified in save_state_dir.
cuda whether to train on gpu
debug whether to print during training.
n_history history size to use. For more info read the paper.
batch_size Batch size to be used for training and validation.
shuffle Whether to shuffle the dataset before each epoch
max_epochs Number of epochs to train.
lr Learning rate to use.
grad_clip Maximum norm for gradients
verbose Print updates every verbose epochs.
gradient_accumulation_steps Number of update steps to accumulate before performing a backward/update pass.
adam_epsilon Epsilon for Adam optimizer.

For the given experiments we ran the following command:

sudo python main.py --trainset="./coqa.train.json" --devset="./coqa.dev.json" --model_name="BERT" --save_state_dir="./output/output4004" --n_history=4 --batch-size=2 --lr=5e-5 --gradient_accumulation_steps=10 --max_epochs=35

Results

All the results are based on n_history = 2:

Model Name Dev F1 Dev EM
SpanBERT 63.74 53.42
BERT 63.08 53.03
DistilBERT 61.5 52.35

Contact

For any issues/questions, you can open a GitHub issue or contact me directly.