DeRPN is a novel region proposal network which concentrates on improving the adaptivity of current detectors. The paper is available here.
Most current detection methods have adopted anchor boxes as regression references. However, the detection performance is sensitive to the setting of the anchor boxes. A proper setting of anchor boxes may vary significantly across different datasets, which severely limits the universality of the detectors. To improve the adaptivity of the detectors, in this paper, we present a novel dimension-decomposition region proposal network (DeRPN) that can perfectly displace the traditional Region Proposal Network (RPN). DeRPN utilizes an anchor string mechanism to independently match object widths and heights, which is conducive to treating variant object shapes. In addition, a novel scale-sensitive loss is designed to address the imbalanced loss computations of different scaled objects, which can avoid the small objects being overwhelmed by larger ones. Comprehensive experiments conducted on both general object detection datasets (Pascal VOC 2007, 2012 and MS COCO) and scene text detection datasets (ICDAR 2013 and COCO-Text) all prove that our DeRPN can significantly outperform RPN. It is worth mentioning that the proposed DeRPN can be employed directly on different models, tasks, and datasets without any modifications of hyperparameters or specialized optimization, which further demonstrates its adaptivity
· Jan. 25, 2019: The code is released.
Welcome to improve DeRPN together. For any questions, please feel free to contact Lele Xie (xie.lele@mail.scut.edu.cn) or Prof. Jin (eelwjin@scut.edu.cn).
If you find DeRPN useful to your research, please consider citing our paper as follow:
@article{xie2019DeRPN,
title = {DeRPN: Taking a further step toward more general object detection},
author = {Lele Xie, Yuliang Liu, Lianwen Jin*, Zecheng Xie}
joural = {AAAI}
year = {2019}
}
Note: The reimplemented results are slightly different from those presented in the paper for different training settings, but the conclusions are still consistent. For example, this code doesn't use multi-scale training which should boost the results for both DeRPN and RPN.
training data: COCO-Text train
test data: COCO-Text test
network | AP@0.5 | recall@0.5 | AP@0.75 | recall@0.75 | |
---|---|---|---|---|---|
RPN+Faster R-CNN | VGG16 | 32.48 | 52.54 | 7.40 | 17.59 |
DeRPN+Faster R-CNN | VGG16 | 47.39 | 70.46 | 11.05 | 25.12 |
RPN+R-FCN | ResNet-101 | 37.71 | 54.35 | 13.17 | 22.21 |
DeRPN+R-FCN | ResNet-101 | 48.62 | 71.30 | 13.37 | 27.57 |
training data: VOC 07+12 trainval
test data: VOC 07 test
Inference time is evaluated on one TITAN XP GPU.
network | inference time | AP@0.5 | AP@0.75 | AP | |
---|---|---|---|---|---|
RPN+Faster R-CNN | VGG16 | 64 ms | 75.53 | 42.08 | 42.60 |
DeRPN+Faster R-CNN | VGG16 | 65 ms | 76.17 | 44.97 | 43.84 |
RPN+R-FCN | ResNet-101 | 85 ms | 78.87 | 54.30 | 50.04 |
DeRPN+R-FCN (900) * | ResNet-101 | 84 ms | 79.21 | 54.43 | 50.28 |
( "*": On Pascal VOC dataset, we found that it is more suitable to train the DeRPN+R-FCN model with 900 proposals. For other experiments, we use the default proposal number to train the models, i.e., 2000 proposals fro Faster R-CNN, 300 proposals for R-FCN. )
training data: COCO 2017 train
test data: COCO 2017 test/val
test set | network | AP | AP50 | AP75 | APS | APM | APL |
---|---|---|---|---|---|---|---|
RPN+Faster R-CNN | VGG16 | 24.2 | 45.4 | 23.7 | 7.6 | 26.6 | 37.3 |
DeRPN+Faster R-CNN | VGG16 | 25.5 | 47.2 | 25.2 | 10.3 | 27.9 | 36.7 |
RPN+R-FCN | ResNet-101 | 27.7 | 47.9 | 29.0 | 10.1 | 30.2 | 40.1 |
DeRPN+R-FCN | ResNet-101 | 28.4 | 49.0 | 29.5 | 11.1 | 31.7 | 40.5 |
val set | network | AP | AP50 | AP75 | APS | APM | APL |
---|---|---|---|---|---|---|---|
RPN+Faster R-CNN | VGG16 | 24.1 | 45.0 | 23.8 | 7.6 | 27.8 | 37.8 |
DeRPN+Faster R-CNN | VGG16 | 25.5 | 47.3 | 25.0 | 9.9 | 28.8 | 37.8 |
RPN+R-FCN | ResNet-101 | 27.8 | 48.1 | 28.8 | 10.4 | 31.2 | 42.5 |
DeRPN+R-FCN | ResNet-101 | 28.4 | 48.5 | 29.5 | 11.5 | 32.9 | 42.0 |
- Requirements
- Installation
- Preparation for Training & Testing
- Usage
- Cuda 8.0 and cudnn 5.1.
- Some python packages: cython, opencv-python, easydict et. al. Simply install them if your system misses these packages.
- Configure the caffe according to your environment (Caffe installation instructions). As the code requires pycaffe, caffe should be built with python layers. In Makefile.config, make sure to uncomment this line:
WITH_PYTHON_LAYER := 1
- An NVIDIA GPU with more than 6GB is required for ResNet-101.
-
Clone the DeRPN repository
git clone https://github.com/HCIILAB/DeRPN.git
-
Build the Cython modules
cd $DeRPN_ROOT/lib make
-
Build caffe and pycaffe
cd $DeRPN_ROOT/caffe make -j8 && make pycaffe
-
Download the datasets of Pascal VOC 2007 & 2012, MS COCO 2017 and COCO-Text.
-
You need to put these datasets under the $DeRPN_ROOT/data folder (with symlinks).
-
For COCO-Text, the folder structure is as follow:
$DeRPN_ROOT/data/coco_text/images/train2014 $DeRPN_ROOT/data/coco_text/images/val2014 $DeRPN_ROOT/data/coco_text/annotations # train2014, val2014, and annotations are symlinks from /pth_to_coco2014/train2014, /pth_to_coco2014/val2014 and /pth_to_coco2014/annotations2014/, respectively.
-
For COCO, the folder structure is as follow:
$DeRPN_ROOT/data/coco/images/train2017 $DeRPN_ROOT/data/coco/images/val2017 $DeRPN_ROOT/data/coco/images/test-dev2017 $DeRPN_ROOT/data/coco/annotations # the symlinks are similar to COCO-Text
-
For Pascal VOC, the folder structure is as follow:
$DeRPN_ROOT/data/VOCdevkit2007 $DeRPN_ROOT/data/VOCdevkit2012 #VOCdevkit2007 and VOCdevkit2012 are symlinks from $VOCdevkit whcich contains VOC2007 and VOC2012.
Please download the ImageNet pretrained models (VGG16 and ResNet-101, password: k4z1), and put them under
$DeRPN_ROOT/data/imagenet_models
cd $DeRPN_ROOT
./experiments/scripts/faster_rcnn_derpn_end2end.sh [GPU_ID] [NET] [DATASET]
# e.g., ./experiments/scripts/faster_rcnn_derpn_end2end.sh 0 VGG16 coco_text
This code is free to the academic community for research purpose only.