Dynamic R-CNN: Towards High Quality Object Detection via Dynamic Training

By Hongkai Zhang, Hong Chang, Bingpeng Ma, Naiyan Wang, Xilin Chen.

This project is based on maskrcnn-benchmark.

[2020.7] Dynamic R-CNN is officially included in MMDetection V2.2, many thanks to @xvjiarui and @hellock for migrating the code.

Abstract

Although two-stage object detectors have continuously advanced the state-of-the-art performance in recent years, the training process itself is far from crystal. In this work, we first point out the inconsistency problem between the fixed network settings and the dynamic training procedure, which greatly affects the performance. For example, the fixed label assignment strategy and regression loss function cannot fit the distribution change of proposals and are harmful to training high quality detectors. Then, we propose Dynamic R-CNN to adjust the label assignment criteria (IoU threshold) and the shape of regression loss function (parameters of SmoothL1 Loss) automatically based on the statistics of proposals during training. This dynamic design makes better use of the training samples and pushes the detector to fit more high quality samples. Specifically, our method improves upon ResNet-50-FPN baseline with 1.9% AP and 5.5% AP90 on the MS COCO dataset with no extra overhead. For more details, please refer to our paper.

Models

Model	Multi-scale training	AP (minival)	AP (test-dev)	Trained model
Dynamic_RCNN_r50_fpn_1x	No	38.9	39.1	Google Drive
Dynamic_RCNN_r50_fpn_2x	No	39.9	39.9	Google Drive
Dynamic_RCNN_r101_fpn_1x	No	41.0	41.2	Google Drive
Dynamic_RCNN_r101_fpn_2x	No	41.8	42.0	Google Drive
Dynamic_RCNN_r101_fpn_2x	Yes	44.4	44.7	Google Drive
Dynamic_RCNN_r101_dcnv2_fpn_2x	Yes	46.7	46.9	Google Drive

1x and 2x mean the model is trained for 90K and 180K iterations, respectively.
For Multi-scale training, the shorter side of images is randomly chosen from (400, 600, 800, 1000, 1200), and the longer side is 1400. We also extend the training time by 1.5x under this setting.
dcnv2 denotes deformable convolutional networks v2. We follow the same setting as maskrcnn-benchmark. Note that the result of this version is slightly lower than that of mmdetection.
All results in the table are obtained using a single model with no extra testing tricks. Additionally, adopting multi-scale testing on model Dynamic_RCNN_r101_dcnv2_fpn_2x achieves 49.2% in AP on COCO test-dev. Please set TEST.BBOX_AUG.ENABLED = True in the config.py to enable multi-scale testing. Here we use five scales with shorter sides (800, 1000, 1200, 1400, 1600) and the longer side is 2000 pixels. Note that Dynamic R-CNN*(50.1% AP) in Table 9 is implemented using MMDetection v1.1, please refer to this link.
If you want to test the model provided by us, please refer to Testing.

Getting started

Installation

0. Requirements

pytorch (v1.0.1.post2, other version have not been tested)
torchvision (v0.2.2.post3, other version have not been tested)
cocoapi
matplotlib
tqdm
cython
easydict
opencv

Anaconda3 is recommended here since it integrates many useful packages. Please make sure that your conda is setup properly with the right environment. Then install pytorch and torchvision manually as follows:

pip install torch==1.0.1.post2
pip install torchvision==0.2.2.post3

Other dependencies will be installed during setup.

1. Clone this repo

git clone https://github.com/hkzhang95/DynamicRCNN.git

2. Compile kernels

Please make sure your CUDA is successfully installed and be added to the PATH. I only test CUDA-9.0 for my experiments.

cd ${DynamicRCNN_ROOT}
python setup.py build develop

3. Prepare data and output directory

cd ${DynamicRCNN_ROOT}
mkdir data
mkdir output

Prepare data and pretrained models:

Then organize them as follows:

DynamicRCNN
├── dynamic_rcnn
├── models
├── output
├── data
│   ├── basemodels/R-50.pkl
│   ├── coco
│   │   ├── annotations
│   │   ├── train2017(2014)
│   │   ├── val2017(2014)

Training

We use torch.distributed.launch in order to launch multi-gpu training.

cd models/zhanghongkai/dynamic_rcnn/coco/dynamic_rcnn_r50_fpn_1x
python -m torch.distributed.launch --nproc_per_node=8 train.py

Outputs

Training and testing logs will be saved automatically in the output directory following the same path as in models.

For example, the experiment directory and log directory are formed as follows:

models/zhanghongkai/dynamic_rcnn/coco/dynamic_rcnn_r50_fpn_1x
output/zhanghongkai/dynamic_rcnn/coco/dynamic_rcnn_r50_fpn_1x

And you can link the log to your experiment directory by this script in the experiment directory:

python config.py -log

Testing

Using -i to specify iteration for testing, default is the latest model.

# for regular testing and evaluation
python -m torch.distributed.launch --nproc_per_node=8 test.py
# for specified iteration
python -m torch.distributed.launch --nproc_per_node=8 test.py -i $iteration_number

If you want to test our provided model, just download the model, move it to the corresponding log directory and create a symbolic link like follows:

# example for Dynamic_RCNN_r50_fpn_1x
cd models/zhanghongkai/dynamic_rcnn/coco/dynamic_rcnn_r50_fpn_1x
python config.py -log
realpath log | xargs mkdir
mkdir -p log/checkpoints
mv path/to/the/model log/checkpoints
realpath log/checkpoints/dynamic_rcnn_r50_fpn_1x_test_model_0090000.pth last_checkpoint | xargs ln -s

Then you can follow the regular testing and evaluation process.

Third-party resources

MxNet implementation: SimpleDet

Acknowledgement

Citations

Please consider citing our paper in your publications if it helps your research:

@article{DynamicRCNN,
    author = {Hongkai Zhang and Hong Chang and Bingpeng Ma and Naiyan Wang and Xilin Chen},
    title = {Dynamic {R-CNN}: Towards High Quality Object Detection via Dynamic Training},
    journal = {arXiv preprint arXiv:2004.06002},
    year = {2020}
}

chenyuege/DynamicRCNN