Code repo for Conditional Focused Neural Question Answering with Large-scale Knowledge Bases
- Refer to Virtuoso.md to install and confiture the software
- Make sure torch7 is installed together with the following dependencies
- logroll:
luarocks install logroll - nngraph:
luarocks install nngraph
- logroll:
- After the installation and configuration of Virtuoso, run
bash data_preprocess.shto finish preprocessing
-
Focused Lableing
cd FocusedLabeling th train_crf.lua -
Entity Type Vector
cd EntityTypeVec th train_ent_typevec.lua -
RNN based Relation Network
cd RelationRNN th train_rel_rnn.lua
In the following, define SPLIT='valid' or 'test'.
-
Run focused labeling on validation/test data
cd FocusedLabeling python generate_inference_data.py --split ${SPLIT} th process_inference.lua -testSplit ${SPLIT} th infer_crf.lua \ -testData inference-data/label.${SPLIT}.t7 \ -modelFile "path-to-pretrained-model"python generate_inference_data.py --split ${SPLIT}will create the filelabel.${SPLIT}.txtin the folderFocusedLabeling/inference-data;th process_inference.luawill turn the text filelabel.${SPLIT}.txtintolabel.${SPLIT}.t7in torch format (both in the folderFocusedLabeling/inference-data);th infer_crf.lua ...will generate the filelabel.result.${SPLIT}in the folderFocusedLabeling.
-
Query candidates based on focused labeling
cd Inference mkdir ${SPLIT} && cd ${SPLIT} python ../query_candidates.py 6 \ ../../PreprocessData/QAData.${SPLIT}.pkl \ ../../FocusedLabeling/label.result.${SPLIT} \ ../../KnowledgeBase/type.top-500.pklThis step will generate the file
QAData.label.${SPLIT}.cpicklein the folderInference/${SPLIT}. -
Generate score data based on the query results
cd Inference/${SPLIT} python ../generate_score_data.py QAData.label.${SPLIT}.cpickleThis step will generate the following files in the same folder
Inference/${SPLIT}:rel.single.${SPLIT}.txt(candidate relations for those with only a single candidate subject)rel.multi.${SPLIT}.txt(candidate relations for those with only multiple candidate subject)type.multi.${SPLIT}.txt(candidate entities for those with multiple candidate subjects)single.${SPLIT}.cpicklemulti.${SPLIT}.cpickle
-
Run relation inference
cd RelationRNN mkdir inference-data th process_inference.lua -testSplit ${SPLIT} th infer_rel_rnn.lua -testData inference-data/rel.single.${SPLIT}.t7 th infer_rel_rnn.lua -testData inference-data/rel.multi.${SPLIT}.t7This step will generate the files
score.rel.single.${SPLIT}andscore.rel.multi.${SPLIT}in the folderRelationRNN. -
Run entity inference
cd EntityTypeVec mkdir inference-data th process_inference.lua -testSplit ${SPLIT} th infer_ent_typevec.lua -testData inference-data/ent.${SPLIT}.t7This step will generate the file
score.ent.multi.multi.${SPLIT}in the folderEntityTypeVec. -
Run joint disambiguation
cd Inference/${SPLIT} python ../joint_disambiguation.py multi.${SPLIT}.cpickle \ ../../RelationRNN/score.rel.multi.${SPLIT} \ ../../EntityTypeVec/score.ent.multi.multi.${SPLIT}