Recurrent Multimodal Interaction for Referring Image Segmentation

This repository contains code for Recurrent Multimodal Interaction for Referring Image Segmentation, ICCV 2017.

If you use the code, please cite

@inproceedings{liu2017recurrent,
  title={Recurrent Multimodal Interaction for Referring Image Segmentation},
  author={Liu, Chenxi and Lin, Zhe and Shen, Xiaohui and Yang, Jimei and Lu, Xin and Yuille, Alan},
  booktitle={{ICCV}},
  year={2017}
}

Setup

  • Tensorflow 1.2.1
  • Download or use symlink, such that the MS COCO images are under data/coco/images/train2014/
  • Download or use symlink, such that the ReferItGame data are under data/referit/images and data/referit/mask
  • Run mkdir external. Download, git clone, or use symlink, such that TF-resnet and TF-deeplab are under external. Then strictly follow the Example Usage section of their README
  • Download, git clone, or use symlink, such that refer is under external. Then strictly follow the Setup and Download section of its README. Also put the refer folder in PYTHONPATH
  • Download, git clone, or use symlink, such that the MS COCO API is under external (i.e. external/coco/PythonAPI/pycocotools)
  • pydensecrf

Data Preparation

python build_batches.py -d Gref -t train
python build_batches.py -d Gref -t val
python build_batches.py -d unc -t train
python build_batches.py -d unc -t val
python build_batches.py -d unc -t testA
python build_batches.py -d unc -t testB
python build_batches.py -d unc+ -t train
python build_batches.py -d unc+ -t val
python build_batches.py -d unc+ -t testA
python build_batches.py -d unc+ -t testB
python build_batches.py -d referit -t trainval
python build_batches.py -d referit -t test

Training and Testing

Specify several options/flags and then run main.py:

  • -g: Which GPU to use. Default is 0.
  • -m: train or test. Training mode or testing mode.
  • -w: resnet or deeplab. Specify pre-trained weights.
  • -n: LSTM or RMI. Model name.
  • -d: Gref or unc or unc+ or referit. Specify dataset.
  • -t: train or trainval or val or test or testA or testB. Which set to train/test on.
  • -i: Number of training iterations in training mode. The iteration number of a snapshot in testing mode.
  • -s: Used only in training mode. How many iterations per snapshot.
  • -v: Used only in testing mode. Whether to visualize the prediction. Default is False.
  • -c: Used only in testing mode. Whether to also apply Dense CRF. Default is False.

For example, to train the ResNet + LSTM model on Google-Ref using GPU 2, run

python main.py -m train -w resnet -n LSTM -d Gref -t train -g 2 -i 750000 -s 50000

To test the 650000-iteration snapshot of the DeepLab + RMI model on UNC testA set using GPU 1 (with visualization and Dense CRF), run

python main.py -m test -w deeplab -n RMI -d unc -t testA -g 1 -i 650000 -v -c

Miscellaneous

Code and data under util/ and data/referit/ are borrowed from text_objseg and slightly modified for compatibility with Tensorflow 1.2.1.

TODO

Add TensorBoard support.