
PyTorch code for Reasoning Visual Dialogs with Structural and Partial Observations

Primary LanguagePythonMIT LicenseMIT

Reasoning Visual Dialogs with Structural and Partial Observations

Pytorch Implementation for the paper:

Reasoning Visual Dialogs with Structural and Partial Observations
Zilong Zheng*, Wenguan Wang*, Siyuan Qi*, Song-Chun Zhu (* equal contributions)
In CVPR 2019 (Oral)

Getting Started

This codebase is tested using Ubuntu 16.04, Python 3.5 and a single NVIDIA TITAN Xp GPU. Similar configurations are preferred.


  • Clone this repo:
git clone https://github.com/zilongzheng/visdial-gnn.git
cd visdial-gnn
  • Install requirements
    • Pytorch 0.4.1
    • For other Python dependencies, run:
      pip install -r requirements.txt

Train/Evaluate VisDial v1.0

  • We use pre-extracted image features as specified here for VisDial v1.0.

  • We use preprocessed dialog data as specified here

  • To reproduce our results, you can download preprocessed data and save it to $PROJECT_DIR/data/v1.0/ by

bash ./scripts/download_data_v1.sh faster_rcnn
  • To train a discriminative model, run:
python train.py --dataroot ./data/v1.0/
  • To evaluate the model using val split, run:
python evaluate.py --dataroot ./data/v1.0/ --split val --ckpt /path/to/checkpoint

Train/Evaluate VisDial v0.9

  • We use pre-extracted image features from VGG-16 and VGG-19 as specified here
  • To download preprocessed data (e.g. vgg19) and save it to $PROJECT_DIR/data/v0.9/, run
bash ./scripts/download_data_v09.sh vgg19
  • To train a discriminative model using vgg19 pretrained image features, run
python train.py --dataroot ./data/v0.9/ \
                --version 0.9 \
                --img_train data_img_vgg19_pool5.h5 \
                --visdial_data visdial_data.h5 \
                --visdial_params visdial_params.json \
                --img_feat_size 512
  • To evaluate the model using val split, run:
python evaluate.py --dataroot ./data/v0.9/ \
                   --version 0.9 \
                   --split val \
                   --ckpt /path/to/checkpoint \
                   --img_val data_img_vgg19_pool5.h5 \
                   --visdial_data visdial_data.h5 \
                   --visdial_params visdial_params.json \
                   --img_feat_size 512


If you use this code for your research, please cite our paper.

    title={Reasoning Visual Dialogs with Structural and Partial Observations},
    author={Zheng, Zilong and Wang, Wenguan and Qi, Siyuan and Zhu, Song-Chun},
    booktitle={Computer Vision and Pattern Recognition (CVPR), 2019 IEEE Conference on},


We use Visual Dialog Challenge Starter Code and GPNN as referenced util code.