Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
By Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun at Microsoft Research
Introduction
Faster R-CNN is an object detection framework based on deep convolutional networks, which includes a Region Proposal Network (RPN) and an Object Detection Network. Both networks are trained for sharing convolutional layers for fast testing.
Faster R-CNN was initially described in an arXiv tech report.
This repo contains a MATLAB re-implementation of Fast R-CNN. Details about Fast R-CNN are in: rbgirshick/fast-rcnn.
This code has been tested on Windows 7/8 64-bit, Windows Server 2012 R2, and Linux, and on MATLAB 2014a.
Python version is available at py-faster-rcnn.
License
Faster R-CNN is released under the MIT License (refer to the LICENSE file for details).
Citing Faster R-CNN
If you find Faster R-CNN useful in your research, please consider citing:
@article{ren15fasterrcnn,
Author = {Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun},
Title = {{Faster R-CNN}: Towards Real-Time Object Detection with Region Proposal Networks},
Journal = {arXiv preprint arXiv:1506.01497},
Year = {2015}
}
Main resutls
| training data | test data | mAP | time/img
------------------------- |:--------------------------------------:|:--------------------:|:-----:|:-----: Faster RCNN, VGG-16 | VOC 2007 trainval | VOC 2007 test | 69.9% | 198ms Faster RCNN, VGG-16 | VOC 2007 trainval + 2012 trainval | VOC 2007 test | 73.2% | 198ms Faster RCNN, VGG-16 | VOC 2012 trainval | VOC 2012 test | 67.0% | 198ms Faster RCNN, VGG-16 | VOC 2007 trainval&test + 2012 trainval | VOC 2012 test | 70.4% | 198ms
Note: The mAP results are subject to random variations. We have run 5 times independently for ZF net, and the mAPs are 59.9 (as in the paper), 60.4, 59.5, 60.1, and 59.5, with a mean of 59.88 and std 0.39.
Contents
- Requirements: software
- Requirements: hardware
- Preparation for Testing
- Testing Demo
- Preparation for Training
- Training
- Resources
Requirements: software
Caffe
build for Faster R-CNN (included in this repository, seeexternal/caffe
)- If you are using Windows, you may download a compiled mex file by running
fetch_data/fetch_caffe_mex_windows_vs2013_cuda65.m
- If you are using Linux or you want to compile for Windows, please follow the instructions on our Caffe branch.
- If you are using Windows, you may download a compiled mex file by running
- MATLAB
Requirements: hardware
GPU: Titan, Titan Black, Titan X, K20, K40, K80.
- Region Proposal Network (RPN)
- 2GB GPU memory for ZF net
- 5GB GPU memory for VGG-16 net
- Ojbect Detection Network (Fast R-CNN)
- 3GB GPU memory for ZF net
- 8GB GPU memory for VGG-16 net
Preparation for Testing:
- Run
fetch_data/fetch_caffe_mex_windows_vs2013_cuda65.m
to download a compiled Caffe mex (for Windows only). - Run
faster_rcnn_build.m
- Run
startup.m
Testing Demo:
- Run
fetch_data/fetch_faster_rcnn_final_model.m
to download our trained models. - Run
experiments/script_faster_rcnn_demo.m
to test a single demo image.- You will see the timing information as below. We get the following running time on K40 @ 875 MHz and Intel Xeon CPU E5-2650 v2 @ 2.60GHz for the demo images with VGG-16:
and with ZF net:001763.jpg (500x375): time 0.201s (resize+conv+proposal: 0.150s, nms+regionwise: 0.052s) 004545.jpg (500x375): time 0.201s (resize+conv+proposal: 0.151s, nms+regionwise: 0.050s) 000542.jpg (500x375): time 0.192s (resize+conv+proposal: 0.151s, nms+regionwise: 0.041s) 000456.jpg (500x375): time 0.202s (resize+conv+proposal: 0.152s, nms+regionwise: 0.050s) 001150.jpg (500x375): time 0.194s (resize+conv+proposal: 0.151s, nms+regionwise: 0.043s) mean time: 0.198s
001763.jpg (500x375): time 0.061s (resize+conv+proposal: 0.032s, nms+regionwise: 0.029s) 004545.jpg (500x375): time 0.063s (resize+conv+proposal: 0.034s, nms+regionwise: 0.029s) 000542.jpg (500x375): time 0.052s (resize+conv+proposal: 0.034s, nms+regionwise: 0.018s) 000456.jpg (500x375): time 0.062s (resize+conv+proposal: 0.034s, nms+regionwise: 0.028s) 001150.jpg (500x375): time 0.058s (resize+conv+proposal: 0.034s, nms+regionwise: 0.023s) mean time: 0.059s
-
The visual results might be different from those in the paper due to numerical variations.
-
Running time on other GPUs
GPU / mean time | VGG-16 | ZF
Titan Black | 174ms | 56ms
Titan X | 151ms | 59ms
Preparation for Training:
- Run
fetch_data/fetch_model_ZF.m
to download an ImageNet-pre-trained ZF net. - Run
fetch_data/fetch_model_VGG16.m
to download an ImageNet-pre-trained VGG-16 net. - Download VOC 2007 and 2012 data to ./datasets
Training:
- Run
experiments/script_faster_rcnn_VOC2007_ZF.m
to train a model with ZF net. It runs four steps as follows:- Train RPN with conv layers tuned; compute RPN results on the train/test sets.
- Train Fast R-CNN with conv layers tuned using step-1 RPN proposals; evaluate detection mAP.
- Train RPN with conv layers fixed; compute RPN results on the train/test sets.
- Train Fast R-CNN with conv layers fixed using step-3 RPN proposals; evaluate detection mAP.
- Note: the entire training time is ~12 hours on K40.
- Run
experiments/script_faster_rcnn_VOC2007_VGG16.m
to train a model with VGG net.- Note: the entire training time is ~2 days on K40.
- Check other scripts in
./experiments
for more settings.
Resources
Note: This documentation may contain links to third party websites, which are provided for your convenience only. Such third party websites are not under Microsoft’s control. Microsoft does not endorse or make any representation, guarantee or assurance regarding any third party website, content, service or product. Third party websites may be subject to the third party’s terms, conditions, and privacy statements.
If the automatic "fetch_data" fails, you may manually download resouces from: