/RASAOpenQA

Primary LanguagePython

RASAOpenQA

Code for the EMNLP 2019 paper "Ranking and Sampling in Open-Domain Question Answering"

Requirements

  • Python (>=3.5.6)
  • TensorFlow (=1.8.0)

Preprocess

cd Ranker
mkdir data
cd data 

download embeddings, datasets and corenlp

unzip embeddings.zip
unzip datasets.zip
unzip corenlp.zip

Ranker

cd Ranker
mkdir tmp_data
mkdir models
python3 initvim_anas.py #Initialize
python3 run.py #Train & Evaluate

Reader

export PYTHONPATH=${PYTHONPATH}:'Path_to_Reader'
cd Reader
mkdir probs
mkdir result
mkdir tmp_data
cd ../Ranker/tmp_data/
cp  list_* id2scores_* ../../Reader/tmp_data
cd ../Reader
python3 run.py merge result/model #Train & Evaluate
python3 docqa/eval/triviaqa_full_document_eval.py  --step 110  -c open-dev  --rank 1  --n_paragraphs 30  --shuffle 0   --max_answer_len 8 -o question-output.json -p paragraph-output.csv result/model-date-time #Test MAX Method
python3 docqa/eval/init_data.py #Test SUM Method

Remark

The above commands are used to train and test on the Quasar-T dataset. You can download the preprocessed data for SearchQA and TriviaQA and save them in Ranker/tmp_data/ directory. Some parameters of the model should be changed according to the paper. After that, you can train the Ranker and the Reader for SearchQA and TriviaQA.