This is the official GitHub repository for the following paper:
Question Answering Infused Pre-training of General-Purpose Contextualized Representations.
Robin Jia, Mike Lewis, and Luke Zettlemoyer.
arXiv, 2021.
Here, you can download our pre-trained QuIP model and reproduce our paper's main results on question answering, paraphrase detection, named entity recognition, and sentiment analysis.
This code has been tested with python 3.7.9. Key installed packages were:
- fairseq 1.0.0a0+9825786 (most recent version as of August 9, 2020)
- scikit-learn 0.24.0
- torch 1.7.1
- transformers 4.4.2
Note that you should install fairseq by following the instructions in the fairseq GitHub repository, not directly through pip
.
A full list of installed packages is provided in requirements.txt
(sufficient but likely not necessary for the purposes of this repository).
We have a single command that will download all relevant data for you:
bash download_data.sh
This does (in order) the following:
- GLUE (for QQP, MRPC, SST-2): Gets the standard GLUE dataset files and puts them at
data/glue_data
. - LM-BFF (for QQP, MRPC, MR, CR): Clones the LM-BFF repository to
data/lm-bff
and follows their instructions to download the data. - SQuAD: Downloads the
train-v1.1.json
,dev-v1.1.json
, andevaluate-v2.0.py
files from the SQuAD website repo and puts them in the directorydata/squad
. - MRQA: Clones the MRQA shared task repository to
data/mrqa
, follows their instructions to download the development data, and gunzips the files. - PAWS: Clones the PAWS repository to
data/paws
and follows their instructions to download the data (Note: This requires python2.7 to exist and have nltk installed). Then, we re-generate our k-shot PAWS splits using the list of IDs inpaws_kshot_splits
and write them todata/lm-bff
, where the other k-shot splits are stored.
Note that we used the few-shot NER splits from [Few-Shot Named Entity Recognition: A Comprehensive Study]https://arxiv.org/abs/2012.14978) by Huang et al.
These are not yet publicly available but will be released by Huang et al. at a later date.
That data should be placed in data/ner/few_shot
.
We have a single command that will download the main model for you:
bash download_model.sh
This downloads both a fairseq-compatible and Huggingface-compatible version of the model, and places them at models/quip
and models/quip-hf
, respectively.
The main QuIP model in fairseq-compatible format can be downloaded at https://dl.fbaipublicfiles.com/quip/quip.tar.gz
. Un-tar the file and you will see 4 files:
model.pt
: The main Transformer model. This has the same architecture as RoBERTa-large, and can be loaded by fairseq exactly like a RoBERTa checkpoint.dict.txt
: Samedict.txt
file used by RoBERTa.model_qa_head_start.pt
: The parameters of the head used by the bi-encoder QA model to predict the start of the answer. This can be loaded by theMockClassificationHead
class found insrc/mock_classification_head.py
.model_qa_head_end.pt
Same as above, except for predicting the end of the answer.
The same model in HuggingFace-compatible format can be downloaded at https://dl.fbaipublicfiles.com/quip/quip-hf.tar.gz
. Un-tar the file and you will get a directory called quip-hf
that contains a checkpoint that can be directly loaded with HuggingFace's code for RoBERTa large models.
Note that this checkpoint does not include the QA start and end heads, but you can load those directly from the files in models/quip
.
We have also made available the following other models:
quip-noTeacher
andquip-noTeacher-hf
: QuIP trained without the teacher model, for comparison.quip-concat
andquip-concat-hf
: QuIP trained by concatenating passage representations in the same batch together, simulating negative examples. We hypothesize that this may be a useful starting point for applications that require handling of negative examples, such as open-domain QA.
These can be downloaded by replacing the second "quip" in the original URL with the name of the model.
python src/biencoder_predict_qa.py squad predictions_squad.json models/quip
python src/run_sentiment.py mrqa predictions_mrqa.json models/quip
The predictions will be written to predictions_squad.json
and predictions_mrqa.json
.
python src/run_sentiment.py [sst2|mr|cr] qa models/quip
python src/run_sentiment.py [qqp|mrpc|paws-qqp|paws-wiki] models/quip
With frozen model:
python src/run_sentiment.py [qqp|mrpc|paws-qqp|paws-wiki] models/quip --train-ft
With full fine-tuning:
python src/run_sentiment.py [qqp|mrpc|paws-qqp|paws-wiki] models/quip --train-skl
Note: We are able to exactly reproduce our paper's numbers when using a V100 GPU, but observe slight differences on a GP100.
To initialize the output head with question embeddings and train 5 times with the 5 different random splits, evaluating on the test set:
bash scripts/sweep_few_shot_ner.sh conll2003 out_dir models/quip-hf --learning_rate 2e-5 --num_train_epochs 200 --use_head_initialization
This saves model checkpoints and results to out_dir
. The command run on conll2003 should print the following:
0.7544890082228561 0.72753114288103 0.7445901913275964 0.6633886828953732 0.8107488986784142
which are the test F1 scores for each of the 5 seeds.
To randomly initialize the output head, just omit the --use_head_initialization
flag.
The majority of "QuIP: Question Answering Infused Pretraining" is licensed under CC-BY-NC, however portions of the project are available under separate license terms: HuggingFace is licensed under the Apache 2.0 license.