Here is the implementation of our IJCAI 2020 paper Overcoming Language Priors with Self-supervised Learning for Visual Question Answering. This repository contains code modified from here, many thanks!
-
python 3.6.8
-
pytorch 1.0.1
-
zarr
-
tdqm
-
spacy
-
h5py
cd data
bash download.sh
python preprocess_image.py --data trainval
python create_dictionary.py --dataroot vqacp2/
python preprocess_text.py --dataroot vqacp2/ --version v2
cd ..
- Train our model with multi-label VQA loss
CUDA_VISIBLE_DEVICES=0 python main.py --dataroot data/vqacp2/
--img_root data/coco/ --output saved_models_cp2/ --self_loss_weight 3 --ml_loss
- Train our model with corss-entropy VQA loss
CUDA_VISIBLE_DEVICES=0 python main.py --dataroot data/vqacp2/
--img_root data/coco/ --output saved_models_cp2/ --self_loss_weight 1.2 --ce_loss
- Train the model with 80% of the original training set
CUDA_VISIBLE_DEVICES=0 python main.py --dataroot data/vqacp2/
--img_root data/coco/ --output saved_models_cp2/ --self_loss_weight 3 --ml_loss --ratio 0.8
- A json file of results from the test set can be produced with:
CUDA_VISIBLE_DEVICES=0 python test.py --dataroot data/vqacp2/ --img_root data/coco/ --checkpoint_path saved_models_cp2/best_model.pth --output saved_models_cp2/result/
- Compute detailed accuracy for each answer type:
python comput_score.py --input saved_models_cp2/result/XX.json --dataroot data/vqacp2/
If you don't want to train from scratch, you can download the pretrained base model from here(for ml_loss), and fine-tune it with our self-supervised loss as below:
CUDA_VISIBLE_DEVICES=0 python main.py --dataroot data/vqacp2/
--img_root data/coco/ --output saved_models_cp2/ --self_loss_weight 3 --ml_loss --checkpoint_path ml_pretrained.pth
A well-trained model (for ml_loss) can be found here. The test results file produced by it can be found here and its performance is as follows:
Overall score: 58.58
Yes/No: 87.47 Num: 40.3 other: 48.45
If you found this code is useful, please cite the following paper:
@inproceedings{ijcai2020-151,
title = {Overcoming Language Priors with Self-supervised Learning for Visual Question Answering},
author = {Zhu, Xi and Mao, Zhendong and Liu, Chunxiao and Zhang, Peng and Wang, Bin and Zhang, Yongdong},
booktitle = {Proceedings of the Twenty-Ninth International Joint Conference on
Artificial Intelligence, {IJCAI-20}},
publisher = {International Joint Conferences on Artificial Intelligence Organization},
editor = {Christian Bessiere}
pages = {1083--1089},
year = {2020},
month = {7},
note = {Main track}
doi = {10.24963/ijcai.2020/151},
url = {https://doi.org/10.24963/ijcai.2020/151},
}