For a given professional article, through machine reading, it is required to be able to locate, analyze, and reason the answers to specific questions in the text. The problem covers six types: factual questions, list-type questions, definition-type questions, opinion-type questions, and text-type questions. The principle is to let the machine read the above 6 types of questions and generate answers. The matching degree of the answers with the standard answers is evaluated using the ROUGE-L and BLEU indicators. For the usage and the results of data, all rights reserved to the organiser. This implementation is inspired by Google's QANet and the blog.
The basic data collection is originally provided by China Electronics Technology Group Corporation No.28 Research Institute, exclusively for participants engaged.
- Python>=2.7
- NumPy
- tqdm
- TensorFlow>=1.5
- spacy==2.0.9
- bottle
To preprocess the data, run
# preprocess the data
python config.py --mode prepro
This procedure of processing is learned from R-Net by HKUST-KnowComp, hyper parameters are stored in config.py. To debug/train/test/demo, run
python config.py --mode debug/train/test/demo
To evaluate the trained model with the code provided by the organiser, run
python test_common.py ~/data/{model_name}.json train/{model_name}/answer/answer.json
The default directory for the tensorboard log file is train/{model_name}/event