CFO

Code repo for Conditional Focused Neural Question Answering with Large-scale Knowledge Bases

Installation and Preprocessing

Refer to Virtuoso.md to install and confiture the software
Make sure torch7 is installed together with the following dependencies
- logroll: luarocks install logroll
- nngraph: luarocks install nngraph
After the installation and configuration of Virtuoso, run bash data_preprocess.sh to finish preprocessing

Training

Focused Lableing
```
cd FocusedLabeling
th train_crf.lua
```

Entity Type Vector

cd EntityTypeVec
th train_ent_typevec.lua

RNN based Relation Network
```
cd RelationRNN
th train_rel_rnn.lua
```

Inference

In the following, define SPLIT='valid' or 'test'.

Run focused labeling on validation/test data
```
cd FocusedLabeling

python generate_inference_data.py --split ${SPLIT}

th process_inference.lua -testSplit ${SPLIT}
th infer_crf.lua \
    -testData inference-data/label.${SPLIT}.t7 \
    -modelFile "path-to-pretrained-model"
```
- python generate_inference_data.py --split ${SPLIT} will create the file label.${SPLIT}.txt in the folder FocusedLabeling/inference-data;
- th process_inference.lua will turn the text file label.${SPLIT}.txt into label.${SPLIT}.t7 in torch format (both in the folder FocusedLabeling/inference-data);
- th infer_crf.lua ... will generate the file label.result.${SPLIT} in the folder FocusedLabeling.

Query candidates based on focused labeling

cd Inference
mkdir ${SPLIT} && cd ${SPLIT}
python ../query_candidates.py 6 \
       ../../PreprocessData/QAData.${SPLIT}.pkl \
       ../../FocusedLabeling/label.result.${SPLIT} \
       ../../KnowledgeBase/type.top-500.pkl

This step will generate the file QAData.label.${SPLIT}.cpickle in the folder Inference/${SPLIT}.

Generate score data based on the query results
```
cd Inference/${SPLIT}
python ../generate_score_data.py QAData.label.${SPLIT}.cpickle
```
This step will generate the following files in the same folder Inference/${SPLIT}:
- rel.single.${SPLIT}.txt (candidate relations for those with only a single candidate subject)
- rel.multi.${SPLIT}.txt (candidate relations for those with only multiple candidate subject)
- type.multi.${SPLIT}.txt (candidate entities for those with multiple candidate subjects)
- single.${SPLIT}.cpickle
- multi.${SPLIT}.cpickle

Run relation inference

cd RelationRNN
mkdir inference-data
th process_inference.lua -testSplit ${SPLIT}
th infer_rel_rnn.lua -testData inference-data/rel.single.${SPLIT}.t7
th infer_rel_rnn.lua -testData inference-data/rel.multi.${SPLIT}.t7

This step will generate the files score.rel.single.${SPLIT} and score.rel.multi.${SPLIT} in the folder RelationRNN.

Run entity inference

cd EntityTypeVec
mkdir inference-data
th process_inference.lua -testSplit ${SPLIT}
th infer_ent_typevec.lua -testData inference-data/ent.${SPLIT}.t7

This step will generate the file score.ent.multi.multi.${SPLIT} in the folder EntityTypeVec.

Run joint disambiguation

cd Inference/${SPLIT}
python ../joint_disambiguation.py multi.${SPLIT}.cpickle \
       ../../RelationRNN/score.rel.multi.${SPLIT} \
       ../../EntityTypeVec/score.ent.multi.multi.${SPLIT}

ASI-SX/CFO

CFO

Installation and Preprocessing

Training

Inference