YOLOv5-LiteοΌlighter, faster and easier to deploy
Perform a series of ablation experiments on yolov5 to make it lighter (smaller Flops, lower memory, and fewer parameters) and faster (add shuffle channel, yolov5 head for channel reduce. It can infer at least 10+ FPS On the Raspberry Pi 4B when input the frame with 320Γ320) and is easier to deploy (removing the Focus layer and four slice operations, reducing the model quantization accuracy to an acceptable range).
Comparison of ablation experiment results
ID | Model | Input_size | Flops | Params | SizeοΌMοΌ | Map@0.5 | Map@.5:0.95 |
---|---|---|---|---|---|---|---|
001 | yolo-fastest | 320Γ320 | 0.25G | 0.35M | 1.4 | 24.4 | - |
002 | nanodet-m | 320Γ320 | 0.72G | 0.95M | 1.8 | - | 20.6 |
003 | yolo-fastest-xl | 320Γ320 | 0.72G | 0.92M | 3.5 | 34.3 | - |
004 | YOLOv5-Litesours | 320Γ320 | 0.97G | 1.54M | 3.3 | 36.1 | 20.9 |
005 | yolov3-tiny | 416Γ416 | 6.96G | 6.06M | 23.0 | 33.1 | 16.6 |
006 | yolov4-tiny | 416Γ416 | 5.62G | 8.86M | 33.7 | 40.2 | 21.7 |
007 | YOLOv5-Litesours | 416Γ416 | 1.63G | 1.54M | 3.3 | 41.3 | 24.3 |
008 | YOLOv5-Litecours | 640Γ640 | 8.6G | 4.37M | 9.2 | 52.5 | 33.0 |
009 | YOLOv5s | 640Γ640 | 17.0G | 7.3M | 14.2 | 55.8 | 35.9 |
010 | YOLOv5-Litegours | 640Γ640 | 15.7G | 5.3M | 10.9 | 56.9 | 38.1 |
Comparison on different platforms
Equipment | Computing backend | System | Input | Framework | v5Lite-s | v5Lite-c | v5Lite-g | YOLOv5s |
---|---|---|---|---|---|---|---|---|
Inter | @i5-10210U | window(x86) | 640Γ640 | openvino | - | 46ms | - | 131ms |
Nvidia | @RTX 2080Ti | Linux(x86) | 640Γ640 | torch | - | - | 15ms | 14ms |
Redmi K30 | @Snapdragon 730G | Android(arm64) | 320Γ320 | ncnn | 28ms | - | - | 163ms |
Raspberrypi 4B | @ARM Cortex-A72 | Linux(arm64) | 320Γ320 | ncnn | 84ms | - | - | 371ms |
Raspberrypi 4B | @ARM Cortex-A72 | Linux(arm64) | 320Γ320 | mnn | 76ms | - | - | 356ms |
- The above is a 4-thread test benchmark
- Raspberrypi 4B enable bf16s optimizationοΌRaspberrypi 64 Bit OS
Β·Model ZooΒ·
@YOLOv5-Lites:
Model | Size | Backbone | Head | Framework | Design for |
---|---|---|---|---|---|
v5Lite-s.pt | 3.3m | shufflenetv2οΌMegviiοΌ | v5Lites-head | Pytorch | Arm-cpu |
v5Lite-s.bin v5Lite-s.param |
3.3m | shufflenetv2 | v5Lites-head | ncnn | Arm-cpu |
v5Lite-s-int8.bin v5Lite-s-int8.param |
1.7m | shufflenetv2 | v5Lites-head | ncnn | Arm-cpu |
v5Lite-s.mnn | 3.3m | shufflenetv2 | v5Lites-head | mnn | Arm-cpu |
v5Lite-s-int4.mnn | 987k | shufflenetv2 | v5Lites-head | mnn | Arm-cpu |
@YOLOv5-Litec:
Model | Size | Backbone | Head | Framework | Design for |
---|---|---|---|---|---|
v5Lite-c.pt | 9m | PPLcnetοΌBaiduοΌ | v5Litec-head | Pytorch | x86-cpu / x86-vpu |
v5Lite-c.bin v5Lite-c.xml |
8.7m | PPLcnet | v5Litec-head | openvivo | x86-cpu / x86-vpu |
@YOLOv5-Liteg:
Model | Size | Backbone | Head | Framework | Design for |
---|---|---|---|---|---|
v5Lite-g.pt | 10.9m | RepvggοΌTsinghuaοΌ | v5Liteg-head | Pytorch | x86-gpu / arm-gpu / arm-npu |
v5Lite-g-int8.engine | 8.5m | Repvgg | v5Liteg-head | Tensorrt | x86-gpu / arm-gpu / arm-npu |
v5lite-g-int8.tmfile | 8.7m | Repvgg | v5Liteg-head | Tengine | arm-npu |
How to use
Install
Python>=3.6.0 is required with all requirements.txt installed including PyTorch>=1.7:
$ git clone https://github.com/ppogg/YOLOv5-Lite
$ cd YOLOv5-Lite
$ pip install -r requirements.txt
Inference with detect.py
detect.py
runs inference on a variety of sources, downloading models automatically from
the latest YOLOv5-Lite release and saving results to runs/detect
.
$ python detect.py --source 0 # webcam
file.jpg # image
file.mp4 # video
path/ # directory
path/*.jpg # glob
'https://youtu.be/NUsoVlDFqZg' # YouTube
'rtsp://example.com/media.mp4' # RTSP, RTMP, HTTP stream
Training
$ python train.py --data coco.yaml --cfg v5lite-s.yaml --weights v5lite-s.pt --batch-size 128
v5lite-c.yaml v5lite-c.pt 96
v5lite-g.yaml v5lite-g.pt 64
If you use multi-gpu. It's faster several times:
$ python -m torch.distributed.launch --nproc_per_node 2 train.py
DataSet
Training set and test set distribution οΌthe path with xx.jpgοΌ
train: ../coco/images/train2017/
val: ../coco/images/val2017/
βββ images # xx.jpg example
β βββ train2017
β β βββ 000001.jpg
β β βββ 000002.jpg
β β βββ 000003.jpg
β βββ val2017
β βββ 100001.jpg
β βββ 100002.jpg
β βββ 100003.jpg
βββ labels # xx.txt example
βββ train2017
β βββ 000001.txt
β βββ 000002.txt
β βββ 000003.txt
βββ val2017
βββ 100001.txt
βββ 100002.txt
βββ 100003.txt
model hub
Here, the original components of YOLOv5 and the reproduced components of YOLOv5-Lite are organized and stored in the model hubοΌ
Updating ...
How to deploy
ncnn for arm-cpu
mnn for arm-cpu
openvino x86-cpu or x86-vpu
tensorrt for arm-gpu or arm-npu or x86-gpu
Android for arm-cpu
Android_demo
This is a Redmi phone, the processor is Snapdragon 730G, and yolov5-lite is used for detection. The performance is as follows:
link: https://github.com/ppogg/YOLOv5-Lite/tree/master/ncnn_Android
Android_v5Lite-s: https://drive.google.com/file/d/1CtohY68N2B9XYuqFLiTp-Nd2kuFWgAUR/view?usp=sharing
Android_v5Lite-g: https://drive.google.com/file/d/1FnvkWxxP_aZwhi000xjIuhJ_OhqOUJcj/view?usp=sharing
More detailed explanation
Detailed model link:
[1] https://zhuanlan.zhihu.com/p/400545131
[2] https://zhuanlan.zhihu.com/p/410874403
[3] https://blog.csdn.net/weixin_45829462/article/details/119787840
[4] https://zhuanlan.zhihu.com/p/420737659
Reference
https://github.com/ultralytics/yolov5