Codes for Factoid Question Answering With Distant Supervision.
I am cleaning the codes for uploading, and some description should be added.
- GPU and CUDA 8 are required
- python >=3.5
- pytorch 0.3.0
- pandas
- msgpack
- spacy 1.x
- cupy
- pynvrtc
- jieba
Please download data files from google drive, and put the files under the "dat" file. Specifically, download these four files,
questions_dis_data_150htmls_using_abstext.txt
triple_weight_by_search.txt
new_mined_paraphrase0124.txt
WebQA.v1.0.tar.gz # is it proper to upload this dataset?
Then unzip the WebQA data with tar -zxvf WebQA.v1.0.tar.gz
.
Train the model via runing
cd DSRC
mkdir logs
python train_model.py
Please refer to parameters.py
for configuration details, where train_idx
is consponding to different experimental configurations in the paper.
Besides the generated training data, we also released the data used to generate the training data, training sample selection and ming the distant paraphrases.
Coming soon.
Autor of sru: Tao Lei.
Author of the Document Reader model: Danqi Chen.
Author of the original Pytorch implementation: Runqi Yang.
Most of the pytorch model code is borrowed from Facebook/ParlAI under a BSD-3 license.