/cspaf

Primary LanguagePythonBSD 3-Clause "New" or "Revised" LicenseBSD-3-Clause

CS-PAF

Credits

This repository is build upon visdial_conv (Agarwal et al.). We express our sincere gratitude to the researchers for providing their code, which has been instrumental in the development of this project.

Environment Configuration

conda create -n cspaf python=3.7
pip install -r requirements.txt
python -c "import nltk; nltk.download('all')"

Data Preparation

Dataset File Source
Visdial v1.0 features_faster_rcnn_x101_train.h5 visdial-challenge-starter-pytorch (Das et al.)
features_faster_rcnn_x101_val.h5
features_faster_rcnn_x101_test.h5
visdial_1.0_word_counts_train.json
glove.npy visdial-principles(Qi et al.)
visdial_1.0_train.json visdial official
visdial_1.0_val.json
visdial_1.0_test.json
visdial_1.0_train_dense_annotations.json
visdial_1.0_val_dense_annotations.json
VisdialConv visdial_1.0_val_crowdsourced.json subsets/visdialconv/(Agarwal et al.)
visdial_1.0_val_dense_annotations_crowdsourced.json
VisPro visdial_1.0_val_vispro.json subsets/vispro/(Agarwal et al.)
visdial_1.0_val_dense_annotations_vispro.json

Train or Finetune

bash -i scripts/cap_hist_early_fusion_disc_train.sh

We use RTX 3090 to train the model, and the batch size per gpu is 12. With a gpu count of 2, we choose a learning rate of 5e-4. The training logs and checkpoints will be saved in directory exps/exp_name.

Evaluate

bash -i scripts/cap_hist_early_fusion_disc_eval.sh

The training logs and checkpoints will be saved in directory exps/exp_name. If you want to get the results generated by EvalAI, you can submit the file exps/exp_name/ranks.json.

Attention Map Visualization (optional)

You can visit the repository Faster-R-CNN-with-model-pretrained-on-Visual-Genome which can generate 2048-d features. If you just want to quickly visualize the results of visdial v1.0, you can also visit the project from our fork version [https://github.com/chenyulu2000/Faster-R-CNN-with-model-pretrained-on-Visual-Genome]. This project has modified some bugs and can generate h5 type files for visdial v1.0 val set, which can be directly used in visual dialog visualization.

python attention_map_vis/extract_questions.py
python attention_map_vis/visualize.py