/factoid_QA_with_distant_spervision

Codes for "Factoid Question Answering With Distant Supervision"

Primary LanguagePythonMIT LicenseMIT

factoid_QA_with_distant_spervision

Codes for Factoid Question Answering With Distant Supervision.

I am cleaning the codes for uploading, and some description should be added.

Requirements

  • GPU and CUDA 8 are required
  • python >=3.5
  • pytorch 0.3.0
  • pandas
  • msgpack
  • spacy 1.x
  • cupy
  • pynvrtc
  • jieba

Download Data

Please download data files from google drive, and put the files under the "dat" file. Specifically, download these four files,

questions_dis_data_150htmls_using_abstext.txt
triple_weight_by_search.txt
new_mined_paraphrase0124.txt
WebQA.v1.0.tar.gz   # is it proper to upload this dataset? 

Then unzip the WebQA data with tar -zxvf WebQA.v1.0.tar.gz.

Model training

Train the model via runing

cd DSRC
mkdir logs
python train_model.py

Please refer to parameters.py for configuration details, where train_idx is consponding to different experimental configurations in the paper.

Automatic training data generation via distant supervision

Besides the generated training data, we also released the data used to generate the training data, training sample selection and ming the distant paraphrases.

Training data generation via distant supervision

Coming soon.

Training sample selection and distant paraphrase minging

Credits

Autor of sru: Tao Lei.

Author of the Document Reader model: Danqi Chen.

Author of the original Pytorch implementation: Runqi Yang.

Most of the pytorch model code is borrowed from Facebook/ParlAI under a BSD-3 license.