Perform a series of ablation experiments on yolov5 to make it lighter (smaller Flops, lower memory, and fewer parameters) and faster (add shuffle channel, yolov5 head for channel reduce. It can infer at least 10+ FPS On the Raspberry Pi 4B when input the frame with 320Γ320) and is easier to deploy (removing the Focus layer and four slice operations, reducing the model quantization accuracy to an acceptable range).
ID | Model | Input_size | Flops | Params | SizeοΌMοΌ | Map@0.5 | Map@.5:0.95 |
---|---|---|---|---|---|---|---|
001 | yolo-fastest | 320Γ320 | 0.25G | 0.35M | 1.4 | 24.4 | - |
002 | YOLOv5-Liteeours | 320Γ320 | 0.73G | 0.78M | 1.7 | 35.1 | - |
003 | NanoDet-m | 320Γ320 | 0.72G | 0.95M | 1.8 | - | 20.6 |
004 | yolo-fastest-xl | 320Γ320 | 0.72G | 0.92M | 3.5 | 34.3 | - |
005 | YOLOXNano | 416Γ416 | 1.08G | 0.91M | 7.3(fp32) | - | 25.8 |
006 | yolov3-tiny | 416Γ416 | 6.96G | 6.06M | 23.0 | 33.1 | 16.6 |
007 | yolov4-tiny | 416Γ416 | 5.62G | 8.86M | 33.7 | 40.2 | 21.7 |
008 | YOLOv5-Litesours | 416Γ416 | 1.66G | 1.64M | 3.4 | 42.0 | 25.2 |
009 | YOLOv5-Litecours | 512Γ512 | 5.92G | 4.57M | 9.2 | 50.9 | 32.5 |
010 | NanoDet-EfficientLite2 | 512Γ512 | 7.12G | 4.71M | 18.3 | - | 32.6 |
011 | YOLOv5s(6.0) | 640Γ640 | 16.5G | 7.23M | 14.0 | 56.0 | 37.2 |
012 | YOLOv5-Litegours | 640Γ640 | 15.6G | 5.39M | 10.9 | 57.6 | 39.1 |
See the wiki: https://github.com/ppogg/YOLOv5-Lite/wiki/Test-the-map-of-models-about-coco
Equipment | Computing backend | System | Input | Framework | v5lite-e | v5lite-s | v5lite-c | v5lite-g | YOLOv5s |
---|---|---|---|---|---|---|---|---|---|
Inter | @i5-10210U | window(x86) | 640Γ640 | openvino | - | - | 46ms | - | 131ms |
Nvidia | @RTX 2080Ti | Linux(x86) | 640Γ640 | torch | - | - | - | 15ms | 14ms |
Redmi K30 | @Snapdragon 730G | Android(armv8) | 320Γ320 | ncnn | 27ms | 38ms | - | - | 163ms |
Xiaomi 10 | @Snapdragon 865 | Android(armv8) | 320Γ320 | ncnn | 10ms | 14ms | - | - | 163ms |
Raspberrypi 4B | @ARM Cortex-A72 | Linux(arm64) | 320Γ320 | ncnn | - | 84ms | - | - | 371ms |
Raspberrypi 4B | @ARM Cortex-A72 | Linux(arm64) | 320Γ320 | mnn | - | 76ms | - | - | 356ms |
- The above is a 4-thread test benchmark
- Raspberrypi 4B enable bf16s optimizationοΌRaspberrypi 64 Bit OS
Model | Size | Backbone | Head | Framework | Design for |
---|---|---|---|---|---|
v5Lite-e.pt | 1.7m | shufflenetv2οΌMegviiοΌ | v5Litee-head | Pytorch | Arm-cpu |
v5Lite-e.bin v5Lite-e.param |
1.7m | shufflenetv2 | v5Litee-head | ncnn | Arm-cpu |
v5Lite-e-int8.bin v5Lite-e-int8.param |
0.9m | shufflenetv2 | v5Litee-head | ncnn | Arm-cpu |
v5Lite-e-fp32.mnn | 3.0m | shufflenetv2 | v5Litee-head | mnn | Arm-cpu |
v5Lite-e-fp32.tnnmodel v5Lite-e-fp32.tnnproto |
2.9m | shufflenetv2 | v5Litee-head | tnn | arm-cpu |
v5Lite-e-320.onnx | 3.1m | shufflenetv2 | v5Litee-head | onnxruntime | x86-cpu |
Model | Size | Backbone | Head | Framework | Design for |
---|---|---|---|---|---|
v5Lite-s.pt | 3.4m | shufflenetv2οΌMegviiοΌ | v5Lites-head | Pytorch | Arm-cpu |
v5Lite-s.bin v5Lite-s.param |
3.3m | shufflenetv2 | v5Lites-head | ncnn | Arm-cpu |
v5Lite-s-int8.bin v5Lite-s-int8.param |
1.7m | shufflenetv2 | v5Lites-head | ncnn | Arm-cpu |
v5Lite-s.mnn | 3.3m | shufflenetv2 | v5Lites-head | mnn | Arm-cpu |
v5Lite-s-int4.mnn | 987k | shufflenetv2 | v5Lites-head | mnn | Arm-cpu |
v5Lite-s-fp16.bin v5Lite-s-fp16.xml |
3.4m | shufflenetv2 | v5Lites-head | openvivo | x86-cpu |
v5Lite-s-fp32.bin v5Lite-s-fp32.xml |
6.8m | shufflenetv2 | v5Lites-head | openvivo | x86-cpu |
v5Lite-s-fp16.tflite | 3.3m | shufflenetv2 | v5Lites-head | tflite | arm-cpu |
v5Lite-s-fp32.tflite | 6.7m | shufflenetv2 | v5Lites-head | tflite | arm-cpu |
v5Lite-s-int8.tflite | 1.8m | shufflenetv2 | v5Lites-head | tflite | arm-cpu |
v5Lite-s-416.onnx | 6.4m | shufflenetv2 | v5Lites-head | onnxruntime | x86-cpu |
Model | Size | Backbone | Head | Framework | Design for |
---|---|---|---|---|---|
v5Lite-c.pt | 9m | PPLcnetοΌBaiduοΌ | v5s-head | Pytorch | x86-cpu / x86-vpu |
v5Lite-c.bin v5Lite-c.xml |
8.7m | PPLcnet | v5s-head | openvivo | x86-cpu / x86-vpu |
v5Lite-c-512.onnx | 18m | PPLcnet | v5s-head | onnxruntime | x86-cpu |
Model | Size | Backbone | Head | Framework | Design for |
---|---|---|---|---|---|
v5Lite-g.pt | 10.9m | RepvggοΌTsinghuaοΌ | v5Liteg-head | Pytorch | x86-gpu / arm-gpu / arm-npu |
v5Lite-g-int8.engine | 8.5m | Repvgg-yolov5 | v5Liteg-head | Tensorrt | x86-gpu / arm-gpu / arm-npu |
v5lite-g-int8.tmfile | 8.7m | Repvgg-yolov5 | v5Liteg-head | Tengine | arm-npu |
v5Lite-g-640.onnx | 21m | Repvgg-yolov5 | yolov5-head | onnxruntime | x86-cpu |
v5lite-e.pt
: | Baidu Drive | Google Drive ||ββββββ
ncnn-fp16
: | Baidu Drive | Google Drive |
|ββββββncnn-int8
: | Baidu Drive | Google Drive |
βββββββonnx-fp32
: | Baidu Drive | Google Drive |
v5lite-s.pt
: | Baidu Drive | Google Drive ||ββββββ
ncnn-fp16
: | Baidu Drive | Google Drive |
|ββββββncnn-int8
: | Baidu Drive | Google Drive |
|ββββββmnn-fp16
: | Baidu Drive | Google Drive |
|ββββββmnn-int4
: | Baidu Drive | Google Drive |
|ββββββonnx-fp32
: | Baidu Drive | Google Drive |
βββββββtengine-fp32
: | Baidu Drive | Google Drive |
v5lite-c.pt
: Baidu Drive | Google Drive ||ββββββ
onnx-fp32
: | Baidu Drive | Google Drive |
βββββββopenvino-fp16
: | Baidu Drive | Google Drive |
v5lite-g.pt
: | Baidu Drive | Google Drive |βββββββ
onnx-fp32
: | Baidu Drive | Google Drive |
Baidu Drive Password: pogg
v5lite-s model: TFLite Float32, Float16, INT8, Dynamic range quantization, ONNX, TFJS, TensorRT, OpenVINO IR FP32/FP16, Myriad Inference Engin Blob, CoreML
https://github.com/PINTO0309/PINTO_model_zoo/tree/main/180_YOLOv5-Lite
Thanks for PINTO0309:https://github.com/PINTO0309/PINTO_model_zoo/tree/main/180_YOLOv5-Lite
Install
Python>=3.6.0 is required with all requirements.txt installed including PyTorch>=1.7:
$ git clone https://github.com/ppogg/YOLOv5-Lite
$ cd YOLOv5-Lite
$ pip install -r requirements.txt
Inference with detect.py
detect.py
runs inference on a variety of sources, downloading models automatically from
the latest YOLOv5-Lite release and saving results to runs/detect
.
$ python detect.py --source 0 # webcam
file.jpg # image
file.mp4 # video
path/ # directory
path/*.jpg # glob
'https://youtu.be/NUsoVlDFqZg' # YouTube
'rtsp://example.com/media.mp4' # RTSP, RTMP, HTTP stream
Training
$ python train.py --data coco.yaml --cfg v5lite-e.yaml --weights v5lite-e.pt --batch-size 128
v5lite-s.yaml v5lite-s.pt 128
v5lite-c.yaml v5lite-c.pt 96
v5lite-g.yaml v5lite-g.pt 64
If you use multi-gpu. It's faster several times:
$ python -m torch.distributed.launch --nproc_per_node 2 train.py
DataSet
Training set and test set distribution οΌthe path with xx.jpgοΌ
train: ../coco/images/train2017/
val: ../coco/images/val2017/
βββ images # xx.jpg example
β βββ train2017
β β βββ 000001.jpg
β β βββ 000002.jpg
β β βββ 000003.jpg
β βββ val2017
β βββ 100001.jpg
β βββ 100002.jpg
β βββ 100003.jpg
βββ labels # xx.txt example
βββ train2017
β βββ 000001.txt
β βββ 000002.txt
β βββ 000003.txt
βββ val2017
βββ 100001.txt
βββ 100002.txt
βββ 100003.txt
model hub
Here, the original components of YOLOv5 and the reproduced components of YOLOv5-Lite are organized and stored in the model hubοΌ
Updating ...
ncnn for arm-cpu
mnn for arm-cpu
openvino x86-cpu or x86-vpu
tensorrt(C++) for arm-gpu or arm-npu or x86-gpu
tensorrt(Python) for arm-gpu or arm-npu or x86-gpu
Android for arm-cpu
This is a Redmi phone, the processor is Snapdragon 730G, and yolov5-lite is used for detection. The performance is as follows:
link: https://github.com/ppogg/YOLOv5-Lite/tree/master/android_demo/ncnn-android-v5lite
Android_v5Lite-s: https://drive.google.com/file/d/1CtohY68N2B9XYuqFLiTp-Nd2kuFWgAUR/view?usp=sharing
Android_v5Lite-g: https://drive.google.com/file/d/1FnvkWxxP_aZwhi000xjIuhJ_OhqOUJcj/view?usp=sharing
new android app:[link] https://pan.baidu.com/s/1PRhW4fI1jq8VboPyishcIQ [keyword] pogg
What is YOLOv5-Lite S/E model: zhihu link (Chinese): https://zhuanlan.zhihu.com/p/400545131
What is YOLOv5-Lite C model: zhihu link (Chinese): https://zhuanlan.zhihu.com/p/420737659
What is YOLOv5-Lite G model: zhihu link (Chinese): https://zhuanlan.zhihu.com/p/410874403
How to deploy on ncnn with fp16 or int8: csdn link (Chinese): https://blog.csdn.net/weixin_45829462/article/details/119787840
How to deploy on onnxruntime: zhihu link (Chinese): https://zhuanlan.zhihu.com/p/476533259
How to deploy on tensorrt: zhihu link (Chinese): https://zhuanlan.zhihu.com/p/478630138
How to optimize on tensorrt: zhihu link (Chinese): https://zhuanlan.zhihu.com/p/463074494
https://github.com/ultralytics/yolov5
https://github.com/megvii-model/ShuffleNet-Series
https://github.com/Tencent/ncnn
If you use YOLOv5-Lite in your research, please cite our work and give a star β:
@misc{yolov5lite2021,
title = {YOLOv5-Lite: Lighter, faster and easier to deploy},
author = {Xiangrong Chen and Ziman Gong},
doi = {0.5281/zenodo.5241425}
year={2021}
}