/DDG

PyTorch implementation of "Digging out Discrimination Information from Generated Samples for Robust Visual Question Answering" (ACL 2023 Findings)

Primary LanguagePythonBSD 3-Clause "New" or "Revised" LicenseBSD-3-Clause

DDG

We provide the PyTorch implementation for Digging out Discrimination Information from Generated Samples for Robust Visual Question Answering (ACL 2023 Findings).

D-VQA

Dependencies

  • Python 3.6
  • PyTorch 1.1.0
  • dependencies in requirements.txt
  • We train and evaluate all of the models based on one TITAN Xp GPU

Getting Started

Installation

  1. Clone this repository:

     git clone https://github.com/Zhiquan-Wen/DDG
     cd DDG
    
  2. Install PyTorch and other dependencies:

     pip install -r requirements.txt
    

Download and preprocess the data

Pretrained UpDn can be found here. Pre-trained PPDB model can be downloaded from here. Preprocessed features can be found (Positive Images and Positive questions)

cd data 
bash download.sh
python preprocess_features.py --input_tsv_folder features.tsv --output_h5 features.h5
python feature_preprocess.py --input_h5 features.h5 --output_path trainval 
python create_dictionary.py --dataroot vqacp2/
python preprocess_text.py --dataroot vqacp2/ --version v2
python obtain_q_positive_samples.py --input vqacp2/ --output vqacp2/ 
python select_q_positive_samples.py --input vqacp2/ --output vqacp2/
python obtain_topk_object_index.py --dataroot vqacp2/ --checkpoint_path <pretrained_UpDn_dir>  --outpu vqacp2/q_id_with_sorted_object_index.npy
cd ..

Training

  • Train our model
CUDA_VISIBLE_DEVICES=0 python main.py --dataroot data/vqacp2/ --img_root data/coco/trainval_features --output saved_models_cp2/ --self_loss_weight 3 --self_loss_weight_q 0.7 --kl_loss_weight 1.0 --dis_loss_weight 0.05 --pos_k 10
  • Train the model with 20% of the original training set
CUDA_VISIBLE_DEVICES=0 python main.py --dataroot data/vqacp2/ --img_root data/coco/trainval_features --output saved_models_cp2/ --self_loss_weight 3 --self_loss_weight_q 0.7 --kl_loss_weight 1.0 --dis_loss_weight 0.05 --pos_k 10 --ratio 0.2 

Evaluation

  • A json file of results from the test set can be produced with:
CUDA_VISIBLE_DEVICES=0 python test.py --dataroot data/vqacp2/ --img_root data/coco/trainval_features --checkpoint_path saved_models_cp2/best_model.pth --output saved_models_cp2/result/
  • Compute detailed accuracy for each answer type:
python comput_score.py --input saved_models_cp2/result/XX.json --dataroot data/vqacp2/

Pretrained model

A well-trained model can be found here with raw training log. The test results file produced by it can be found here and its performance is as follows:

Overall score: 61.22
Yes/No: 89.47 Num: 48.70 other: 49.86

Quick Reproduce

  1. Preparing enviroments: we prepare a docker image (built from Dockerfile) which has included above dependencies, you can pull this image from dockerhub or aliyun registry:
docker pull zhiquanwen/debias_vqa:v1
docker pull registry.cn-shenzhen.aliyuncs.com/wenzhiquan/debias_vqa:v1
docker tag registry.cn-shenzhen.aliyuncs.com/wenzhiquan/debias_vqa:v1 zhiquanwen/debias_vqa:v1
  1. Start docker container: start the container by mapping the dataset in it:
docker run --gpus all -it --ipc=host --network=host --shm-size 32g -v /host/path/to/data:/xxx:ro zhiquanwen/debias_vqa:v1
  1. Running: refer to Training and Evaluation steps in Getting Started.

Reference

If you found this code is useful, please cite the following paper:

@inproceedings{DDG,
  title     = {Digging out Discrimination Information from Generated Samples for Robust Visual Question Answering},
  author    = {Zhiquan Wen, 
               Yaowei Wang, 
               Mingkui Tan, 
               Qingyao Wu, 
               Qi Wu},
  booktitle = {ACL Findings},
  year = {2023}
}