This repository is build upon visdial_conv (Agarwal et al.). We express our sincere gratitude to the researchers for providing their code, which has been instrumental in the development of this project.
conda create -n cspaf python=3.7
pip install -r requirements.txt
python -c "import nltk; nltk.download('all')"
Dataset | File | Source |
---|---|---|
Visdial v1.0 | features_faster_rcnn_x101_train.h5 | visdial-challenge-starter-pytorch (Das et al.) |
features_faster_rcnn_x101_val.h5 | ||
features_faster_rcnn_x101_test.h5 | ||
visdial_1.0_word_counts_train.json | ||
glove.npy | visdial-principles(Qi et al.) | |
visdial_1.0_train.json | visdial official | |
visdial_1.0_val.json | ||
visdial_1.0_test.json | ||
visdial_1.0_train_dense_annotations.json | ||
visdial_1.0_val_dense_annotations.json | ||
VisdialConv | visdial_1.0_val_crowdsourced.json | subsets/visdialconv/(Agarwal et al.) |
visdial_1.0_val_dense_annotations_crowdsourced.json | ||
VisPro | visdial_1.0_val_vispro.json | subsets/vispro/(Agarwal et al.) |
visdial_1.0_val_dense_annotations_vispro.json |
bash -i scripts/cap_hist_early_fusion_disc_train.sh
We use RTX 3090 to train the model, and the batch size per gpu is 12. With a gpu count of 2, we choose a learning rate of 5e-4. The training logs and checkpoints will be saved in directory exps/exp_name.
bash -i scripts/cap_hist_early_fusion_disc_eval.sh
The training logs and checkpoints will be saved in directory exps/exp_name. If you want to get the results generated by EvalAI, you can submit the file exps/exp_name/ranks.json.
You can visit the repository Faster-R-CNN-with-model-pretrained-on-Visual-Genome which can generate 2048-d features. If you just want to quickly visualize the results of visdial v1.0, you can also visit the project from our fork version [https://github.com/chenyulu2000/Faster-R-CNN-with-model-pretrained-on-Visual-Genome]. This project has modified some bugs and can generate h5 type files for visdial v1.0 val set, which can be directly used in visual dialog visualization.
python attention_map_vis/extract_questions.py
python attention_map_vis/visualize.py