This repo implements SSD (Single Shot MultiBox Detector). The implementation is heavily influenced by the projects pytorch-ssd , NVIDIA Apex and YOLOF. The design goal is modularity and extensibility.
Currently, it has MobileNetV2, EfficientNet, MobileDet_gpu and VGG based SSD/SSD-Lite implementations.
- Python 3.6+
- OpenCV
- Pytorch 1.0 or Pytorch 0.4+
- Caffe2
- Pandas
- Boto3 if you want to train models on the Google OpenImages Dataset.
- Apex
- Jstat
- torchstat
- Mish-cuda
- jetson_stats
git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
wget -P models https://storage.googleapis.com/models-hao/mb2-ssd-lite-mp-0_686.pth
wget -P models https://storage.googleapis.com/models-hao/voc-model-labels.txt
python run_ssd_live_demo.py mb2-ssd-lite models/mb2-ssd-lite-mp-0_686.pth models/voc-model-labels.txt
The above MobileNetV2 SSD-Lite model is not ONNX-Compatible, as it uses Relu6 which is not supported by ONNX. The code supports the ONNX-Compatible version. Once I have trained a good enough MobileNetV2 model with Relu, I will upload the corresponding Pytorch and Caffe2 models.
You may notice MobileNetV2 SSD/SSD-Lite is slower than MobileNetV1 SSD/Lite on PC. However, MobileNetV2 is faster on mobile devices.
python run_live_demo.py ef-yolof path/to/your/model.pth models/voc-model-labels.txt path/to/your/video.mp4
backbone | mAP(%) | Param(M) | MAdds(G) | FLOPS | Size(MB) |
---|---|---|---|---|---|
mobilenet2-SSDlite | 55.8 | 3.09 | 1.3 | 663.42M | 144.21 |
mobilenet2-YOLOF | 8.07 | 2.34 | 1.19G | 150.38 | |
mobilenet3-small-SSDlite | |||||
mobilenet3-small-YOLOF | |||||
mobilenet3-large-SSDlite | 54.3 | 2.16 | 1.04 | 525.5M | 86.34 |
mobilenet3-large-YOLOF | 5.98 | 1.33 | 666.67M | 42.58 | |
efficientnet-SSDlite-B0 | 51.7 | 4.84 | 1.6 | 804.8M | 143.32 |
efficientnet-YOLOF-B0 | 9.86 | 2.66 | 1.34G | 153.58 | |
mobiledet-SSDlite | 3.07 | 2.22 | 1,12G | 89.72 | |
mobiledet-YOLOF | 8.1 | 3.28 | 1.65G | 102.35 |
The code to re-produce the model:
python train_ssd.py --dataset_type voc --datasets ~/data/VOC0712/VOC2007 ~/data/VOC0712/VOC2012 --validation_dataset ~/data/VOC0712/test/VOC2007/ --net mb2-ssd-lite --base_net models/mb2-imagenet-71_8.pth --scheduler cosine --lr 0.01 --t_max 200 --validation_epochs 5 --num_epochs 300
python train_ssd.py --datasets ~/data/VOCkitti/ --validation_dataset ~/data/VOCkitti/ --net mbd-ssd-lite --base_net /model/to/your/model.pth --batch_size 32 --num_epochs 300 --scheduler "multi-step” —-milestones “120,160”
python train_ssd.py --datasets ~/data/VOC0712/VOC2007/ ~/data/VOC0712/VOC2012/ --validation_dataset ~/data/VOC0712/test/VOC2007/ --net mb1-ssd --base_net models/mobilenet_v1_with_relu_69_5.pth --batch_size 24 --num_epochs 200 --scheduler cosine --lr 0.01 --t_max 200
python train_ssd.py --datasets /home/mnt/CenterNet/data/kitti/training/ --validation_dataset /home/mnt/CenterNet/data/kitti/training/ --net mb1-ssd --base_net models/mobilenet_v1_with_relu_69_5.pth --batch_size 24 --num_epochs 200 --scheduler cosine --lr 0.01 --t_max 200
The dataset path is the parent directory of the folders: Annotations, ImageSets, JPEGImages, SegmentationClass and SegmentationObject. You can use multiple datasets to train.
python eval_ssd.py --net mb1-ssd --dataset ~/data/VOC0712/test/VOC2007/ --trained_model models/mobilenet-v1-ssd-mp-0_675.pth --label_file models/voc-model-labels.txt
'''rqp python eval_ssd.py --net sq-ssd-lite --dataset /home/sues/data/kitti/VOCdevkit/VOC2007 --trained_model models/sq-ssd-lite/sq-ssd-lite-Epoch-215-Loss-2.6122971734692975.pth --label_file models/voc-model-labels.txt '''
python convert_to_caffe2_models.py mb1-ssd models/mobilenet-v1-ssd-mp-0_675.pth models/voc-model-labels.txt
The converted models are models/mobilenet-v1-ssd.onnx, models/mobilenet-v1-ssd_init_net.pb and models/mobilenet-v1-ssd_predict_net.pb. The models in the format of pbtxt are also saved for reference.
Let's we are building a model to detect guns for security purpose.
Before you start you can try the demo.
wget -P models https://storage.googleapis.com/models-hao/gun_model_2.21.pth
wget -P models https://storage.googleapis.com/models-hao/open-images-model-labels.txt
python run_ssd_example.py mb1-ssd models/gun_model_2.21.pth models/open-images-model-labels.txt ~/Downloads/big.JPG
If you manage to get more annotated data, the accuracy could become much higher.
python open_images_downloader.py --root ~/data/open_images --class_names "Handgun,Shotgun" --num_workers 20
Dataset | Train/Val | Test | Annotations |
---|---|---|---|
VOC2007 | Train/Validation Data (439 MB) | Test Data With Annotations (431 MB) | Development Kit |
VOC2012 | Train/Validation Data (1.9 GB) | Test Data (1.8 GB) | Development Kit |
KITTI | |||
COCO2014 | [2014 Train images 83K/13GB] [2014 Val images 41K/6GB] |
[2014 Test images 41K/6GB] | [2014 Train/Val annotations 241MB] |
COCO2017 | [2017 Train images 118K/18GB] [2017 Val images 5K/1GB] |
[2017 Test images 41K/6GB] | [2017 Train/Val annotations 241MB] |
python train_yolof.py --dataset_type voc --datasets /mnt/Datasets/voc/voc2007/VOCdevkit/VOC2007/ --net mb1-yolof --scheduler cosine --batch_size 24 --lr 0.01 --t_max 200 --num_epochs 200 --base_net_lr 0.001 --validation_epochs 5
python train_ssd.py --datasets ~/data/VOC0712/VOC2007/ ~/data/VOC0712/VOC2012/ --validation_dataset ~/data/VOC0712/test/VOC2007/ --net "mb1-ssd" --base_net models/vgg16_reducedfc.pth --batch_size 24 --num_epochs 150 --scheduler cosine --lr 0.0012 --t_max 150 --validation_epochs 5
You can freeze the base net, or all the layers except the prediction heads.
--freeze_base_net Freeze base net layers.
--freeze_net Freeze all the layers except the prediction head.
You can also use different learning rates for the base net, the extra layers and the prediction heads.
--lr LR, --learning-rate LR
--base_net_lr BASE_NET_LR
initial learning rate for base net.
--extra_layers_lr EXTRA_LAYERS_LR
As subsets of open images data can be very unbalanced, it also provides a handy option to roughly balance the data.
--balance_data Balance training data by down-sampling more frequent
labels.
python run_ssd_example.py mb1-ssd models/mobilenet-v1-ssd-Epoch-99-Loss-2.2184619531035423.pth models/open-images-model-labels.txt ~/Downloads/gun.JPG
! The model is not really ONNX-Friendly due the issue mentioned here "qfgaohao/pytorch-ssd#33 (comment)"
The Scaled L2 Norm Layer has been replaced with BatchNorm to make the net ONNX compatible.
The pretrained based is borrowed from https://s3.amazonaws.com/amdegroot-models/vgg16_reducedfc.pth .
python train_ssd.py --datasets ~/data/VOC0712/VOC2007/ ~/data/VOC0712/VOC2012/ --validation_dataset ~/data/VOC0712/test/VOC2007/ --net "vgg16-ssd" --base_net models/vgg16_reducedfc.pth --batch_size 24 --num_epochs 150 --scheduler cosine --lr 0.0012 --t_max 150 --validation_epochs 5
python eval_ssd.py --net vgg16-ssd --dataset ~/data/VOC0712/test/VOC2007/ --trained_model models/vgg16-ssd-Epoch-115-Loss-2.819455094383535.pth --label_file models/voc-model-labels.txt
- NAS
- AnchorFree.
A simple, fast, and efficient object detector without FPN.
- The
cvpods
version can be found in https://github.com/megvii-model/YOLOF. - The neat and re-organized
Detectron2
version of YOLOF is available at https://github. com/chensnathan/YOLOF.
@inproceedings{chen2021you,
title={You Only Look One-level Feature},
author={Chen, Qiang and Wang, Yingming and Yang, Tong and Zhang, Xiangyu and Cheng, Jian and Sun, Jian},
booktitle={IEEE Conference on Computer Vision and Pattern Recognition},
year={2021}
}
@article{Xu2021TEYOLOFTA,
title={TE-YOLOF: Tiny and efficient YOLOF for blood cell detection},
author={Fanxin Xu and Xiangkui Li and Hang Yang and Yali Wang and Wei Xiang},
journal={ArXiv},
year={2021},
volume={abs/2108.12313}
}