R-FCN with PVANet
This repo contains an integrated framework of R-FCN and PVANet for object (clothing) detecion. For the details of these methods, please refer to the official github repo (R-FCN , PVANet) and paper.
Main feature
- OHEM (CVPR2016) implementation from R-FCN (for data augmentation)
- Plateau (for learning rate) and bounding box voting implementation from PVANet
- Merge BN/Scale layer code from PVANet (for acceleration)
Installation
Follow the step0 to step3 that were shown in PVANet.
How to run the demo
- Download PVANet_comp or PVANet_lite detection model (scp or ?!)
- Run the demo script (TBM)
- Models and deploy OOXXXXXXXXXXXXXXXX
How to train PVANet_comp model
This model is trained and tested by using the following features: Plateau, OHEM, bbox voting, and I also apply SVD decomposition on FCs to compress model size (two small FCs to approximate the original FC).
The simple training log is in OOOXXXXXXXXXXXXXXXXXXXXXXXXXX
Run the script for training PVANet_comp model
./tools/train_net.py \
--gpu 0 \
--solver models/pascal_voc/PVANet/solver_plateau_comp_ohem.prototxt \
--weights models/pascal_voc/PVANet/pva_comp_original.model \
--imdb FlickrClothingTrain_0926 \
--iters 500000 \
--cfg experiments/cfgs/rfcn_pvanet_ohem.yml
Note:
- This model will fine-tune from VOC model (pva_comp_original.model), and the snapshot function in train_net.py has been modified (snapshot bbox_pred_new layer, not bbox_pred layer)
- My comp version is training a new model (no conversion). Actually, the suitable way for generating comp version is converting from PVANet model by :
tools/compress_net.py
Training data
- The format of annotation is similar to VOC (XML files)
- An example of imdb is in lib/dataset/largeflickr.py
- There would be a cache file generated in data/cache, if you want to add more data with same imdb name, please delete the previous imdb file (.pkl)
How to train PVANet_lite model
This model is trained and tested by using the following features: Plateau, OHEM, BN layer (RC), Conv_rpn/Convf, bbox voting .
The simple training log is in OOOXXXXXXXXXXXXXXXXXXXXXXXXXX
Run the script for training PVANet_lite model:
./tools/train_net.py \
--gpu 0 \
--solver models/pascal_voc/PVANet_lite/solver_plateau_rc.prototxt \
--weights models/pascal_voc/PVANet_lite/original.model \
--imdb FlickrClothingTrain_0926 \
--iters 500000 \
--cfg experiments/cfgs/rfcn_pvanet_ohem.yml
Training data: same (FlickrClothingTrain_0926)
Results
Model | Inference Time | Model Size | MAP |
---|---|---|---|
PVANet_org | 48 / X ms | 313 MB | 0.835/0.816 |
PVANet_comp_480000.caffemodel | 36 / 28 ms | 79 MB | 0.855/0.822 |
PVANet_lite_390000.caffemodel | 16 / 12 ms | 36 MB | 0.835/0.797 |
R-FCN with PVANet | 35 / 28 ms | 22 MB | 0.837/0.797 |
Note
- The inference time was run on Titan X (Maxwell and Pascal).
- These models have done BN/Scale conversion.
- The MAP is testing on two datasets with scenario of street and shop (VIPs).