A pytorch implementation of the ACL 2017 paper Reading Wikipedia to Answer Open-Domain Questions (DrQA).
Reading comprehension is a task to produce an answer when given a question and one or more pieces of evidence (usually natural language paragraphs). Compared to question answering over knowledge bases, reading comprehension models are more flexible and have revealed a great potential for zero-shot learning.
SQuAD is a reading comprehension benchmark where there's only a single piece of evidence and the answer is guaranteed to be a part of the evidence. Since the publication of SQuAD dataset, there has been fast progress in the research of reading comprehension and a bunch of great models have come out. DrQA is one that is conceptually simpler than most others but still yields strong performance even as a single model.
The motivation for this project is to offer a clean version of DrQA for the machine reading comprehension task, so one can quickly do some modifications and try out new ideas. Click here to see the comparison with what's described in the original paper and with two "official" projects ParlAI and DrQA.
- python >=3.5
- pytorch 0.3.0 (please refer to the previous version if you use pytorch 0.2.0)
- numpy
- msgpack
- spacy 1.x
- download the project via
git clone https://github.com/hitvoice/DrQA.git; cd DrQA
- make sure python 3, pip, wget and unzip are installed.
- install pytorch matched with your OS, python and cuda versions.
- install the remaining requirements via
pip install -r requirements.txt
- download the SQuAD datafile, GloVe word vectors and Spacy English language models using
bash download.sh
.
# prepare the data
python prepro.py
# train for 40 epochs with batchsize 32
python train.py -e 40 -bs 32
EM | F1 | |
---|---|---|
in the original paper | 69.5 | 78.8 |
in this project | 69.64 | 78.76 |
offical(Spacy) | 69.71 | 78.94 |
offical(CoreNLP) | 69.76 | 79.09 |
Compared with the official implementation:
Compared to what's described in the original paper:
- The grammatical features are generated by spaCy instead of Stanford CoreNLP. It's much faster and produces similar scores.
Compared to the code in facebookresearch/DrQA:
- This project is much more light-weighted and focusing solely on training and evaluating on SQuAD dataset while lacking the document retriever, the interactive inference API, and some other features.
- The implementation in facebookresearch/DrQA is able to train on multiple GPUs, while (currently and for simplicity) in this implementation we only support single-GPU training.
Compared to the code in facebookresearch/ParlAI:
- The DrQA model is no longer wrapped in a chatbot framework, which makes the code more readable, easier to modify and is faster to train. The preprocessing for text corpus is performed only once, while in a dialog framework raw text is transmitted each time and preprocessing for the same text must be done again and again.
- This is a full implementation of the original paper, while the model in ParlAI is a partial implementation, missing all grammatical features (lemma, POS tags and named entity tags).
- Some minor bug fixes. Some of them have been merged into ParlAI.
Maintainer: Runqi Yang.
Credits: thank Jun Yang for code review and advice.
Most of the pytorch model code is borrowed from Facebook/ParlAI under a BSD-3 license.