cspaf: A Python repository from chenyulu2000

CS-PAF

Credits

This repository is build upon visdial_conv (Agarwal et al.). We express our sincere gratitude to the researchers for providing their code, which has been instrumental in the development of this project.

Environment Configuration

conda create -n cspaf python=3.7
pip install -r requirements.txt

python -c "import nltk; nltk.download('all')"

Data Preparation

Dataset	File	Source
Visdial v1.0	features_faster_rcnn_x101_train.h5	visdial-challenge-starter-pytorch (Das et al.)
	features_faster_rcnn_x101_val.h5
	features_faster_rcnn_x101_test.h5
	visdial_1.0_word_counts_train.json
	glove.npy	visdial-principles(Qi et al.)
	visdial_1.0_train.json	visdial official
	visdial_1.0_val.json
	visdial_1.0_test.json
	visdial_1.0_train_dense_annotations.json
	visdial_1.0_val_dense_annotations.json
VisdialConv	visdial_1.0_val_crowdsourced.json	subsets/visdialconv/(Agarwal et al.)
VisdialConv	visdial_1.0_val_dense_annotations_crowdsourced.json	subsets/visdialconv/(Agarwal et al.)
VisPro	visdial_1.0_val_vispro.json	subsets/vispro/(Agarwal et al.)
VisPro	visdial_1.0_val_dense_annotations_vispro.json	subsets/vispro/(Agarwal et al.)

Train or Finetune

bash -i scripts/cap_hist_early_fusion_disc_train.sh

We use RTX 3090 to train the model, and the batch size per gpu is 12. With a gpu count of 2, we choose a learning rate of 5e-4. The training logs and checkpoints will be saved in directory exps/exp_name.

Evaluate

bash -i scripts/cap_hist_early_fusion_disc_eval.sh

The training logs and checkpoints will be saved in directory exps/exp_name. If you want to get the results generated by EvalAI, you can submit the file exps/exp_name/ranks.json.

Attention Map Visualization (optional)

You can visit the repository Faster-R-CNN-with-model-pretrained-on-Visual-Genome which can generate 2048-d features. If you just want to quickly visualize the results of visdial v1.0, you can also visit the project from our fork version [https://github.com/chenyulu2000/Faster-R-CNN-with-model-pretrained-on-Visual-Genome]. This project has modified some bugs and can generate h5 type files for visdial v1.0 val set, which can be directly used in visual dialog visualization.

python attention_map_vis/extract_questions.py

python attention_map_vis/visualize.py