Faster R-CNN (Pytorch) on Pedestrian Detection

Introduction

These codes are highly based on jwyang/faster-rcnn.pytorch, and modify it to apply on Pedestrian Detection

In the implementing of jwyang/faster-rcnn.pytorch, there're several unique and new features compared with the above implementations:

It is pure Pytorch code. We convert all the numpy implementations to pytorch!
It supports multi-image batch training. We revise all the layers, including dataloader, rpn, roi-pooling, etc., to support multiple images in each minibatch.
It supports multiple GPUs training. We use a multiple GPU wrapper (nn.DataParallel here) to make it flexible to use one or more GPUs, as a merit of the above two features.
It supports three pooling methods. We integrate three pooling methods: roi pooing, roi align and roi crop. More importantly, we modify all of them to support multi-image batch training.
It is memory efficient. We limit the image aspect ratio, and group images with similar aspect ratios into a minibatch. As such, we can train resnet101 and VGG16 with batchsize = 4 (4 images) on a sigle Titan X (12 GB). When training with 8 GPU, the maximum batchsize for each GPU is 3 (Res101), totally 24.
It is faster. Based on the above modifications, the training is much faster. We report the training speed on NVIDIA TITAN Xp in the tables below.

Enviroment

Python 3.6
Pytorch 1.0
CUDA 9.2
Opencv 3.4

Sorry for the Docker file missed

Preparation

Download codes

Firstly, clone the code

git clone https://github.com/dynotw/Pedestrian-Detection-Faster-R-CNN.git

Then, create a folder:

cd faster-rcnn.pytorch && mkdir data

DataSet

Due to size of dataset, I can't store it in my Google Drive, so I can't provide processed dataset for you. If you meet any probelm, you can contact me.

Because we only care about pedestrian detection scenario, we only use pedestrian dataset provided by Caltech. In order to simplify this application, I only do binary classification, person or background, thus I only keep 'person' label in Caltech Pedestrian Dataset.

We need to pre-process Caltech Pedestrain Dataset by the following steps:

Transfer .seq file into .jpg image
Transfer .vbb file into .xml file. Please notice I only keep 'person' label
Make train, valid, test dataset, by generating train,txt, valid.text and test.txt to imply which .jpg and .xxml in each dataset.
Put the above files into respective directory

File Structure

|VOCdevkit                        <- Contains all 
|   |Annotations                  <- Contains .xml files 
|   |JPEGImages                   <- Contains .jpg files 
|   |ImagegSets
|       |Main                     <- Contains .txt files

Please make sure .jpg files, .xml files and .txt files are match with each other.

Pretrained Model

I used two pretrained models in jwyang/faster-rcnn.pytorch experiments, VGG and ResNet101. You can download these two models from:

VGG16: Dropbox, VT Server
ResNet101: Dropbox, VT Server

Download them and put them into the data/pretrained_model/.

Compilation

Install all the python dependencies using pip:

pip install -r requirements.txt

Compile the cuda dependencies using following simple commands:

cd lib
python setup.py build develop

It will compile all the modules you need, including NMS, ROI_Pooing, ROI_Align and ROI_Crop. The default version is compiled with Python 2.7, please compile by yourself if you are using a different python version.

As pointed out in this issue, if you encounter some error during the compilation, you might miss to export the CUDA paths to your environment.

Train

Before training, set the right directory to save and load the trained models. Change the arguments "save_dir" and "load_dir" in trainval_net.py and test_net.py to adapt to your environment.

To train a faster R-CNN model with vgg16 on pascal_voc, simply run (Please notice dataset name in the command):

CUDA_VISIBLE_DEVICES=$GPU_ID python trainval_net.py \
                   --dataset pascal_voc --net vgg16 \
                   --bs $BATCH_SIZE --nw $WORKER_NUMBER \
                   --lr $LEARNING_RATE --lr_decay_step $DECAY_STEP \
                   --cuda

where 'bs' is the batch size with default 1. Alternatively, to train with resnet101 on pascal_voc, simple run:

 CUDA_VISIBLE_DEVICES=$GPU_ID python trainval_net.py \
                    --dataset pascal_voc --net res101 \
                    --bs $BATCH_SIZE --nw $WORKER_NUMBER \
                    --lr $LEARNING_RATE --lr_decay_step $DECAY_STEP \
                    --cuda

Above, BATCH_SIZE and WORKER_NUMBER can be set adaptively according to your GPU memory size. On Titan Xp with 12G memory, it can be up to 4.

If you have multiple (say 8) Titan Xp GPUs, then just use them all! Try:

python trainval_net.py --dataset pascal_voc --net vgg16 \
                       --bs 24 --nw 8 \
                       --lr $LEARNING_RATE --lr_decay_step $DECAY_STEP \
                       --cuda --mGPUs

Test

If you want to evlauate the detection performance of a pre-trained vgg16 model on pascal_voc test set, simply run

python test_net.py --dataset pascal_voc --net vgg16 \
                   --checksession $SESSION --checkepoch $EPOCH --checkpoint $CHECKPOINT \
                   --cuda

Specify the specific model session, chechepoch and checkpoint, e.g., SESSION=1, EPOCH=6, CHECKPOINT=416.

Demo

Image

If you want to run detection on your own images with a pre-trained model, download the pretrained model listed in above tables or train your own models at first, then add images to folder $ROOT/images, and then run

python demo.py --net vgg16 \
               --checksession $SESSION --checkepoch $EPOCH --checkpoint $CHECKPOINT \
               --cuda --load_dir path/to/model/directoy

Then you will find the detection results in folder $ROOT/images.

** If you need to apply on different dataset, please change the demp.py, especially Here in terms of categories.**

Below are some detection results:

Webcam or Video

You can use a webcam in a real-time demo by running

python demo.py --net vgg16 \
               --checksession $SESSION --checkepoch $EPOCH --checkpoint $CHECKPOINT \
               --cuda --load_dir path/to/model/directoy \
               --webcam $WEBCAM_ID

The demo is stopped by clicking the image window and then pressing the 'q' key. Here my result