/refinedet.pytorch

A PyTorch inplementation of RefineDet

Primary LanguagePythonMIT LicenseMIT

RefineDet in PyTorch

This is a PyTorch implementation of Single-Shot Refinement Neural Network for Object Detection that is a work by Shifeng Zhang, Longyin Wen, Xiao Bian, Zhen Lei and Stan Z. Li in CVPR2018. The official and original Caffe code can be found here. This implementation mainly refers the official RefineDet in Caffe sfzhang15/RefineDet and a PyTorch implementation of SSD amdegroot/ssd.pytorch. Its sublibrary(libs/datasets) to process datasets is obtained from jwyang/faster-rcnn.pytorch with minor modification. A simple post in Chinese about this project is here.

Repuisites

  • Python 3.6.1
  • PyTorch 1.2.0
  • CUDA 9.0 or higher

Preparation

First of all, clone the project and create two folders. The "data" is used for pretrained models and datasets. The "output" is used for output models

git clone https://github.com/dd604/refinedet.pytorch.git
cd refinedet.pytorch
mkdir data
mkdir output

Compilation

Install all the python dependencies if necessary:

pip install -r requirements.txt

Compile the dependencies as following:

cd libs
sh make.sh

It will complie the COCOAPI. The default version is complied with python 3.6.4. They should be re-compiled if you use a different python version.

Data preparation

  • PASCAL_VOC 07+12: You can follow the instructions in py-faster-rcnn or jwyang/faster-rcnn.pytorch to prepare VOC datasets, i.e., putting the data or creating soft links in the folder data/.

  • COCO: You can use COCO2014 to train your model by the same setting as PASCAL_VOC07+12.

The directory trees in data/ in my projects are as following:

├── coco -> /root/dataset/coco
├── VOCdevkit2007
│   └── VOC2007 -> /root/dataset/voc/VOCdevkit/VOC2007
└── VOCdevkit2012
    └── VOC2012 -> /root/dataset/voc/VOCdevkit/VOC2012

In details, the VOC datasets are as following:

VOCdevkit2007
|__ VOC2007
    |_ JPEGImages
    |_ Annotations
    |_ ImageSets
    |_ SegmentationClass
    
VOCdevkit2012
|__ VOC2012
    |_ JPEGImages
    |_ Annotations
    |_ ImageSets
    |_ SegmentationClass

The coco dataset is as following:

coco
|__ annotations
|   |_ instances_valminusminival2014.json
|   |_ instances_minival2014.json
|   |_ instances_train2014.json
|   |_ instances_val2014.json
|   |_ ...
|__ images
    |__ train2014
    |   |_ <im-1-name>.jpg
    |   |_ ...
    |   |_ <im-N-name>.jpg
    |__ val2014
        |_ <im-1-name>.jpg
        |_ ...
        |_ <im-N-name>.jpg

Pre-trained model

You can train a RefineDet detector with VGG16 or ResNet101 as a base network. The pretrained models can be downloaded from vgg16_reducedfc.pth and resnet101.pth. By default, the directories for VGG16 weights is assumed as following:

mkdir data/pretrained_model
cd data/pretrained_model
wget https://s3.amazonaws.com/amdegroot-models/vgg16_reducedfc.pth

For ResNet101:

mkdir data/pretrained_model
cd data/pretrained_model
wget https://download.pytorch.org/models/resnet101-5d3b4d8f.pth -O resnet101.pth

Train

To train a RefineDet model with PASCAL VOC07+12, run as:

python -u train_refinedet.py --dataset voc --input_size 320 --batch_size 32 --network vgg16 --base_model vgg16_reducedfc.pth 

Change dataset to "coco", if you want to use COCO2014.

Performance

The project trys to reproduce the performance of RefineDet in Caffe, but there are some gaps for VGG16. For resnet101, the results are comparable. If you have any suggetion to promote this reproduction, please leave a message in the issues.

1). PASCAL VOC07+12 (Train/Test: VOC07+12/VOC07)

Method Backbone Input Size mAP
SSD VGG16 300 x 300 77.2
SSD VGG16 512 x 512 79.8
RefineDet(Official) VGG16 320 x 320 80.0
RefineDet(Official) VGG16 512 x 512 81.8
RefineDet(Our) VGG16 320 x 320 78.6
RefineDet(Our) VGG16 512 x 512 79.1

The trained models producing the above performance can be downloaded from Dropbox at vgg16_refinedet320_voc and vgg16_refinedet512_voc, or from BaiduPan at vgg16_refinedet320_voc(password: d236) and vgg16_refinedet512_voc(password: iejy).

2). COCO2014 (Train/Test: trainval115k/minval5k)

Method Backbone Input Size mAP
SSD321 ResNet101 321 x 321 28.0
RefineDet(Official) ResNet101 320 x 320 32.0
RefineDet(Official) ResNet101 512 x 512 36.4
RefineDet(Our) ResNet101 320 x 320 31.7
RefineDet(Our) ResNet101 512 x 512 36.6

The trained model can be download from Dropbox at resnet101_refinedet320_coco and resnet101_refinedet512_coco or from BaiduPan at resnet101_refinedet320_coco(password: jgjt) and resnet101_refinedet512_coco(password: pk2f). Training is failed with NAN loss when input size is 512x512, and I am seeking reasons. Now the problem of NAN loss is solved.

Evaluation

To evaluate the trained model, you can run "eval_refinedet.py". For example, download the trained vgg16_refinedet320_voc model (named vgg16_refinedet320_voc_120000.pth), and put it to "output". Assign parameters as following:

python -u eval_refinedet.py --input_size 320 --dataset voc --network vgg16 --model_path "./output/vgg16_refinedet320_voc_120000.pth"

Demo

Using the above trained vgg16_refinedet320_voc model (vgg16_refinedet320_voc_120000.pth) and put it to "output", you can run demos at the folder "demo".

cd demo
python demo_simple.py

It detects objects on a local image "000004.jpg" and draw results with opencv on the image "000004_result.jpg". You can also run demo.ipynb with jupyter notebook to visualize detection results.

Miscellaneous

There are some major changes compared with the previous project version. Firstly, the problem of NAN loss is solved. Secondly, the construction of the project is modified for readability.

Authors

References