This is a PyTorch implementation of Single-Shot Refinement Neural Network for Object Detection that is a work by Shifeng Zhang, Longyin Wen, Xiao Bian, Zhen Lei and Stan Z. Li in CVPR2018. The official and original Caffe code can be found here. This implementation mainly refers the official RefineDet in Caffe sfzhang15/RefineDet and a PyTorch implementation of SSD amdegroot/ssd.pytorch. Its sublibrary(libs/datasets) to process datasets is obtained from jwyang/faster-rcnn.pytorch with minor modification. A simple post in Chinese about this project is here.
- Python 3.6.1
- PyTorch 1.2.0
- CUDA 9.0 or higher
First of all, clone the project and create two folders. The "data" is used for pretrained models and datasets. The "output" is used for output models
git clone https://github.com/dd604/refinedet.pytorch.git
cd refinedet.pytorch
mkdir data
mkdir output
Install all the python dependencies if necessary:
pip install -r requirements.txt
Compile the dependencies as following:
cd libs
sh make.sh
It will complie the COCOAPI. The default version is complied with python 3.6.4. They should be re-compiled if you use a different python version.
-
PASCAL_VOC 07+12: You can follow the instructions in py-faster-rcnn or jwyang/faster-rcnn.pytorch to prepare VOC datasets, i.e., putting the data or creating soft links in the folder data/.
-
COCO: You can use COCO2014 to train your model by the same setting as PASCAL_VOC07+12.
The directory trees in data/ in my projects are as following:
├── coco -> /root/dataset/coco
├── VOCdevkit2007
│ └── VOC2007 -> /root/dataset/voc/VOCdevkit/VOC2007
└── VOCdevkit2012
└── VOC2012 -> /root/dataset/voc/VOCdevkit/VOC2012
In details, the VOC datasets are as following:
VOCdevkit2007
|__ VOC2007
|_ JPEGImages
|_ Annotations
|_ ImageSets
|_ SegmentationClass
VOCdevkit2012
|__ VOC2012
|_ JPEGImages
|_ Annotations
|_ ImageSets
|_ SegmentationClass
The coco dataset is as following:
coco
|__ annotations
| |_ instances_valminusminival2014.json
| |_ instances_minival2014.json
| |_ instances_train2014.json
| |_ instances_val2014.json
| |_ ...
|__ images
|__ train2014
| |_ <im-1-name>.jpg
| |_ ...
| |_ <im-N-name>.jpg
|__ val2014
|_ <im-1-name>.jpg
|_ ...
|_ <im-N-name>.jpg
You can train a RefineDet detector with VGG16 or ResNet101 as a base network. The pretrained models can be downloaded from vgg16_reducedfc.pth and resnet101.pth. By default, the directories for VGG16 weights is assumed as following:
mkdir data/pretrained_model
cd data/pretrained_model
wget https://s3.amazonaws.com/amdegroot-models/vgg16_reducedfc.pth
For ResNet101:
mkdir data/pretrained_model
cd data/pretrained_model
wget https://download.pytorch.org/models/resnet101-5d3b4d8f.pth -O resnet101.pth
To train a RefineDet model with PASCAL VOC07+12, run as:
python -u train_refinedet.py --dataset voc --input_size 320 --batch_size 32 --network vgg16 --base_model vgg16_reducedfc.pth
Change dataset to "coco", if you want to use COCO2014.
The project trys to reproduce the performance of RefineDet in Caffe, but there are some gaps for VGG16. For resnet101, the results are comparable. If you have any suggetion to promote this reproduction, please leave a message in the issues.
1). PASCAL VOC07+12 (Train/Test: VOC07+12/VOC07)
Method | Backbone | Input Size | mAP |
---|---|---|---|
SSD | VGG16 | 300 x 300 | 77.2 |
SSD | VGG16 | 512 x 512 | 79.8 |
RefineDet(Official) | VGG16 | 320 x 320 | 80.0 |
RefineDet(Official) | VGG16 | 512 x 512 | 81.8 |
RefineDet(Our) | VGG16 | 320 x 320 | 78.6 |
RefineDet(Our) | VGG16 | 512 x 512 | 79.1 |
The trained models producing the above performance can be downloaded from Dropbox at vgg16_refinedet320_voc and vgg16_refinedet512_voc, or from BaiduPan at vgg16_refinedet320_voc(password: d236) and vgg16_refinedet512_voc(password: iejy).
2). COCO2014 (Train/Test: trainval115k/minval5k)
Method | Backbone | Input Size | mAP |
---|---|---|---|
SSD321 | ResNet101 | 321 x 321 | 28.0 |
RefineDet(Official) | ResNet101 | 320 x 320 | 32.0 |
RefineDet(Official) | ResNet101 | 512 x 512 | 36.4 |
RefineDet(Our) | ResNet101 | 320 x 320 | 31.7 |
RefineDet(Our) | ResNet101 | 512 x 512 | 36.6 |
The trained model can be download from Dropbox at resnet101_refinedet320_coco and resnet101_refinedet512_coco or from BaiduPan at resnet101_refinedet320_coco(password: jgjt) and resnet101_refinedet512_coco(password: pk2f).
Training is failed with NAN loss when input size is 512x512, and I am seeking reasons. Now the problem of NAN loss is solved.
To evaluate the trained model, you can run "eval_refinedet.py". For example, download the trained vgg16_refinedet320_voc model (named vgg16_refinedet320_voc_120000.pth), and put it to "output". Assign parameters as following:
python -u eval_refinedet.py --input_size 320 --dataset voc --network vgg16 --model_path "./output/vgg16_refinedet320_voc_120000.pth"
Using the above trained vgg16_refinedet320_voc model (vgg16_refinedet320_voc_120000.pth) and put it to "output", you can run demos at the folder "demo".
cd demo
python demo_simple.py
It detects objects on a local image "000004.jpg" and draw results with opencv on the image "000004_result.jpg". You can also run demo.ipynb with jupyter notebook to visualize detection results.
There are some major changes compared with the previous project version. Firstly, the problem of NAN loss is solved. Secondly, the construction of the project is modified for readability.