Reimplementation for Iterative Visual Reasoning Beyond Convolutions(CVPR2018),i've reimplemented it on pytorch according to endernewton/iter-reason
- This is a reimplementation of the system described in the paper according to the author's codes: endernewton/iter-reason .
- The author endernewton has published the codes for spatial reasoning, so this codes only contain the baseline model and the spatial reasoning model. Global reasoning with knowledge graph has not been added.
- I've tried to reimplemente the project strictly according to the author's codes. The
crop_and_resize
function is build on top of theroi_align
function in ruotianluo/pytorch-faster-rcnn. Also the weight initialization for all the modules is kept the same as the original codes(normal,xavier). - The pretrained backbone models come from pytorch pretrained models, but you can use caffe pretrained models as well.
- For now, the result of this reimplementation is lower than that reported in the paper by 2%~3%. If you are seeking to reproduce the results in the original paper, please use the official code based on tensorflow.
- Feel free to contact me if you encounted any issues.
- Pytorch-0.3
- Python 2.7
- Tensorboard(this is optional)
- Cython
- opencv-python
Set up data, here we use ADE20K as an example.
mkdir -p data/ADE
cd data/ADE
wget -v http://groups.csail.mit.edu/vision/datasets/ADE20K/ADE20K_2016_07_26.zip
tar -xzvf ADE20K_2016_07_26.zip
mv ADE20K_2016_07_26/* ./
rmdir ADE20K_2016_07_26
# then get the train/val/test split
wget -v http://xinleic.xyz/data/ADE_split.tar.gz
tar -xzvf ADE_split.tar.gz
rm -vf ADE_split.tar.gz
cd ../..
cd ./lib
sh make.sh
cd ..
The default version is compiled with Python 2.7, please compile by yourself if you are using a different python version." If you encounterd any issues, please refer to faster-rcnn.pytorch.
Note that you need to set the argument DATA_DIR
in opts.py according to the dataset. If you want to use the pretrained model from caffe you need to set the argument caffe
.
For the baseline model:
# Train_val:
CUDA_VISIBLE_DEVICES=0 python trainval_net.py --net res50 --cuda True --train_id 0.1 --iters 320000 --lr_decay_step 280000 --lr 0.0005
# Test
CUDA_VISIBLE_DEVICES=0 python test.py --net res50 --cuda True --train_id 0.1 --model_name your-model-name.pth
For the spatial reasoning model:
#Train_val:
CUDA_VISIBLE_DEVICES=0 python trainval_memory_net.py --net memory_res50 --cuda True --train_id 1.1 --iters 320000 --lr_decay_step 280000 --lr 0.0005
#Test:
CUDA_VISIBLE_DEVICES=0 python test_memory.py --net memory_res50 --cuda True --train_id 1.1 --model_name your-model-name.pth
Use pytorch pretrained model.Backbone is ResNet-50.
model | per-instance AP | per-instance AC | per-class AP | per-class AC | lr decay at(step) | iters(step) |
---|---|---|---|---|---|---|
baseline | 0.657 | 0.657 | 0.391 | 0.330 | 280k | 363k |
local-iter=2 | 0.674 | 0.673 | 0.407 | 0.320 | 280k | 363k |
local-iter=3 | 0.675 | 0.676 | 0.411 | 0.314 | 280k | 363k |
Use caffe pretrained model.
model | per-instance AP | per-instance AC | per-class AP | per-class AC | lr decay at(step) | iters(step) |
---|---|---|---|---|---|---|
baseline | 0.648 | 0.648 | 0.381 | 0.326 | 280k | 320K |
@inproceedings{chen18iterative,
author = {Xinlei Chen and Li-Jia Li and Li Fei-Fei and Abhinav Gupta},
title = {Iterative Visual Reasoning Beyond Convolutions},
booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
Year = {2018}
}