/VCIN

Authors's code for "Variational Causal Inference Network for Explanatory Visual Question Answering" and "Integrating Neural-Symbolic Reasoning with Variational Causal Inference Network for Explanatory Visual Question Answering"

Primary LanguagePythonMIT LicenseMIT

Integrating Neural-Symbolic Reasoning with Variational Causal Inference Network for Explanatory Visual Question Answering

Dizhan Xue, Shengsheng Qian, and Changsheng Xu.

MAIS, Institute of Automation, Chinese Academy of Sciences

GitHub stars Hits

Data

  1. Download the GQA Dataset.
  2. Download the GQA-OOD Dataset
  3. Download the bottom-up features and unzip it.
  4. Extracting features from the raw tsv files (Important: You need to run the code in Linux):
python ./preprocessing/extract_tsv.py --input $TSV_FILE --output $FEATURE_DIR
  1. We provide the annotations of GQA-REX Dataset in model/processed_data/converted_explanation_train_balanced.json and model/processed_data/converted_explanation_val_balanced.json.
  2. (Optional) You can construct the GQA-REX Dataset by yourself following instructions by its authors.
  3. Download our generated programs of the GQA dataset from Google Drive.
  4. (Optional) You can generate programs by yourself following this project.

Models

We provide four models in model/model/model.py.

Two baselines:

  1. REX-VisualBert is from this project.
  2. REX-LXMERT replaces the backbone VisualBert of REX-VisualBert by LXMERT.

Two our models (using LXMERT as backbone):

  1. VCIN is proposed in our ICCV 2023 paper "Variational Causal Inference Network for Explanatory Visual Question Answering".
  2. Pro-VCIN is proposed in TPAMI 2024 paper "Integrating Neural-Symbolic Reasoning with Variational Causal Inference Network for Explanatory Visual Question Answering".

Training and Test

Before training, you need to first generate the dictionary for questions, answers, explanations, and program modules:

cd ./model
python generate_dictionary --question $GQA_ROOT/question --exp $EXP_DIR  --pro $PRO_DIR --save ./processed_data

The training process can be called as:

python main.py --mode train --anno_dir $GQA_ROOT/question --ood_dir $OOD_ROOT/data --sg_dir $GQA_ROOT/scene_graph --lang_dir ./processed_data --img_dir $FEATURE_DIR/features --bbox_dir $FEATURE_DIR/box --checkpoint_dir $CHECKPOINT --explainable True

To evaluate on the GQA-testdev set or generating submission file for online evaluation on the test-standard set, call:

python main.py --mode $MODE --anno_dir $GQA_ROOT/question --ood_dir $OOD_ROOT/data --lang_dir ./processed_data --img_dir $FEATURE_DIR/features --weights $CHECKPOINT/model_best.pth --explainable True

and set $MODE to eval or submission accordingly.

Reference

If you find our papers or code helpful, please cite it as below. Thanks!

@inproceedings{xue2023variational,
  title={Variational Causal Inference Network for Explanatory Visual Question Answering},
  author={Xue, Dizhan and Qian, Shengsheng and Xu, Changsheng},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={2515--2525},
  year={2023}
}

@article{xue2024integrating,
  title={Integrating Neural-Symbolic Reasoning With Variational Causal Inference Network for Explanatory Visual Question Answering},
  author={Xue, Dizhan and Qian, Shengsheng and Xu, Changsheng},
  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
  year={2024},
  publisher={IEEE}
}