Iterative-Visual-Reasoning.pytorch

Reimplementation for Iterative Visual Reasoning Beyond Convolutions(CVPR2018)，i've reimplemented it on pytorch according to endernewton/iter-reason

Note

This is a reimplementation of the system described in the paper according to the author's codes: endernewton/iter-reason .
The author endernewton has published the codes for spatial reasoning, so this codes only contain the baseline model and the spatial reasoning model. Global reasoning with knowledge graph has not been added.
I've tried to reimplemente the project strictly according to the author's codes. The crop_and_resize function is build on top of the roi_align function in ruotianluo/pytorch-faster-rcnn. Also the weight initialization for all the modules is kept the same as the original codes(normal,xavier).
The pretrained backbone models come from pytorch pretrained models, but you can use caffe pretrained models as well.
For now, the result of this reimplementation is lower than that reported in the paper by 2%~3%. If you are seeking to reproduce the results in the original paper, please use the official code based on tensorflow.
Feel free to contact me if you encounted any issues.

Mainly Depencies

Pytorch-0.3
Python 2.7
Tensorboard(this is optional)
Cython
opencv-python

Data preparation

Set up data, here we use ADE20K as an example.

mkdir -p data/ADE
cd data/ADE
wget -v http://groups.csail.mit.edu/vision/datasets/ADE20K/ADE20K_2016_07_26.zip
tar -xzvf ADE20K_2016_07_26.zip
mv ADE20K_2016_07_26/* ./
rmdir ADE20K_2016_07_26
# then get the train/val/test split
wget -v http://xinleic.xyz/data/ADE_split.tar.gz
tar -xzvf ADE_split.tar.gz
rm -vf ADE_split.tar.gz
cd ../..

Compilation (for computing roi crop_and_resize)

cd ./lib
sh make.sh
cd ..

The default version is compiled with Python 2.7, please compile by yourself if you are using a different python version." If you encounterd any issues, please refer to faster-rcnn.pytorch.

Scripts for train_val, test

Note that you need to set the argument DATA_DIR in opts.py according to the dataset. If you want to use the pretrained model from caffe you need to set the argument caffe.

For the baseline model:

# Train_val:
CUDA_VISIBLE_DEVICES=0 python trainval_net.py --net res50 --cuda True --train_id 0.1 --iters 320000 --lr_decay_step 280000 --lr 0.0005

# Test
CUDA_VISIBLE_DEVICES=0 python test.py --net res50 --cuda True --train_id 0.1 --model_name your-model-name.pth

For the spatial reasoning model:

#Train_val:
CUDA_VISIBLE_DEVICES=0 python trainval_memory_net.py --net memory_res50 --cuda True --train_id 1.1 --iters 320000 --lr_decay_step 280000 --lr 0.0005

#Test:
CUDA_VISIBLE_DEVICES=0 python test_memory.py --net memory_res50 --cuda True --train_id 1.1 --model_name your-model-name.pth

Benchmarking

Use pytorch pretrained model.Backbone is ResNet-50.

model	per-instance AP	per-instance AC	per-class AP	per-class AC	lr decay at(step)	iters(step)
baseline	0.657	0.657	0.391	0.330	280k	363k
local-iter=2	0.674	0.673	0.407	0.320	280k	363k
local-iter=3	0.675	0.676	0.411	0.314	280k	363k

Use caffe pretrained model.

model	per-instance AP	per-instance AC	per-class AP	per-class AC	lr decay at(step)	iters(step)
baseline	0.648	0.648	0.381	0.326	280k	320K

References

@inproceedings{chen18iterative,
    author = {Xinlei Chen and Li-Jia Li and Li Fei-Fei and Abhinav Gupta},
    title = {Iterative Visual Reasoning Beyond Convolutions},
    booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
    Year = {2018}
}

coderSkyChen/Iterative-Visual-Reasoning.pytorch