Collaborative Learning for Weakly Supervised Object Detection

If you use this code in your research, please cite

@inproceedings{ijcai2018-135,
  title     = {Collaborative Learning for Weakly Supervised Object Detection},
  author    = {Jiajie Wang and Jiangchao Yao and Ya Zhang and Rui Zhang},
  booktitle = {Proceedings of the Twenty-Seventh International Joint Conference on
               Artificial Intelligence, {IJCAI-18}},
  publisher = {International Joint Conferences on Artificial Intelligence Organization},             
  pages     = {971--977},
  year      = {2018},
  month     = {7},
  doi       = {10.24963/ijcai.2018/135},
  url       = {https://doi.org/10.24963/ijcai.2018/135},
}

Prerequisites

  • A basic pytorch installation. The version is 0.2. If you are using the old version 0.1.12, you can checkout 0.1.12 branch.
  • Python packages you might not have: cffi, opencv-python, easydict (similar to py-faster-rcnn). For easydict make sure you have the right version. Xinlei uses 1.6.
  • tensorboard-pytorch to visualize the training and validation curve. Please build from source to use the latest tensorflow-tensorboard.

Installation

  1. Clone the repository
git clone https://github.com/ruotianluo/pytorch-faster-rcnn.git
  1. Choose your -arch option to match your GPU for step 3 and 4.
GPU model Architecture
TitanX (Maxwell/Pascal) sm_52
GTX 960M sm_50
GTX 1080 (Ti) sm_61
Grid K520 (AWS g2.2xlarge) sm_30
Tesla K80 (AWS p2.xlarge) sm_37

Note: You are welcome to contribute the settings on your end if you have made the code work properly on other GPUs.

  1. Build RoiPooling module
cd pytorch-faster-rcnn/lib/layer_utils/roi_pooling/src/cuda
echo "Compiling roi_pooling kernels by nvcc..."
nvcc -c -o roi_pooling_kernel.cu.o roi_pooling_kernel.cu -x cu -Xcompiler -fPIC -arch=sm_52
cd ../../
python build.py
cd ../../../
  1. Build NMS
cd lib/nms/src/cuda
echo "Compiling nms kernels by nvcc..."
nvcc -c -o nms_kernel.cu.o nms_kernel.cu -x cu -Xcompiler -fPIC -arch=sm_52
cd ../../
python build.py
cd ../../

Setup data

Please follow the instructions of py-faster-rcnn here to setup VOC. The steps involve downloading data and optionally creating soft links in the data folder. Since faster RCNN does not rely on pre-computed proposals, it is safe to ignore the steps that setup proposals.

If you find it useful, the data/cache folder created on Xinlei's side is also shared here.

Train your own model

  1. Download pre-trained models and weights. For the pretrained wsddn model, you can find the download link here. For other pre-trained models like VGG16 and Resnet V1 models, they are provided by pytorch-vgg and pytorch-resnet (the ones with caffe in the name). You can download them in the data/imagenet_weights folder. For example for VGG16 model, you can set up like:

    mkdir -p data/imagenet_weights
    cd data/imagenet_weights
    python # open python in terminal and run the following Python code
    import torch
    from torch.utils.model_zoo import load_url
    from torchvision import models
    
    sd = load_url("https://s3-us-west-2.amazonaws.com/jcjohns-models/vgg16-00b39a1b.pth")
    sd['classifier.0.weight'] = sd['classifier.1.weight']
    sd['classifier.0.bias'] = sd['classifier.1.bias']
    del sd['classifier.1.weight']
    del sd['classifier.1.bias']
    
    sd['classifier.3.weight'] = sd['classifier.4.weight']
    sd['classifier.3.bias'] = sd['classifier.4.bias']
    del sd['classifier.4.weight']
    del sd['classifier.4.bias']
    
    torch.save(sd, "vgg16.pth")
    cd ../..

    For Resnet101, you can set up like:

    mkdir -p data/imagenet_weights
    cd data/imagenet_weights
    # download from my gdrive (link in pytorch-resnet)
    mv resnet101-caffe.pth res101.pth
    cd ../..
  2. Train (and test, evaluation)

./experiments/scripts/train.sh [GPU_ID] [DATASET] [NET] [WSDDN_PRETRAINED]
# Examples:
./experiments/scripts/train.sh 0 pascal_voc vgg16 path_to_wsddn_pretrained_model
  1. Visualization with Tensorboard
tensorboard --logdir=tensorboard/vgg16/voc_2007_trainval/ --port=7001 &
  1. Test and evaluate
./experiments/scripts/test.sh [GPU_ID] [DATASET] [NET] [WSDDN_PRETRAINED]
# Examples:
./experiments/scripts/test.sh 0 pascal_voc vgg16 path_to_wsddn_pretrained_model

By default, trained networks are saved under:

output/[NET]/[DATASET]/default/

Test outputs are saved under:

output/[NET]/[DATASET]/default/[SNAPSHOT]/

Tensorboard information for train and validation is saved under:

tensorboard/[NET]/[DATASET]/default/
tensorboard/[NET]/[DATASET]/default_val/

Our results can be found here