This is a PyTorch re-implementation of YOLOv4 architecture based on the official darknet implementation AlexeyAB/darknet with PASCAL VOC, COCO and Customer dataset
name | train Dataset | test Dataset | test size | mAP | inference time(ms) | params(M) | model link |
---|---|---|---|---|---|---|---|
mobilenetv2-YOLOV4 | VOC trainval(07+12) | VOC test(07) | 416 | 0.851 | 11.29 | 46.34 | model |
Mobilenetv3-YOLOv4 is arriving!(You only need to change the MODEL_TYPE in config/yolov4_config.py)
This repo add some useful attention methods in backbone.The following pictures illustrate such thing:
- SEnet(CVPR 2017)
- CBAM(CVPR 2018)
This repo is simple to use,easy to read and uncomplicated to improve compared with others!!!
Please cite the article in your publications if it helps your research MDPI Link:
@article{
title = "GC-YOLO: You Only Look Once with Global Context Block",
journal = "Electronics",
pages = "9,1235",
year = "2020",
doi = "https://doi.org/10.3390/electronics9081235",
author = "Yang Yang and Hongmin Deng",
}
- Nvida GeForce RTX 2080TI
- CUDA10.0
- CUDNN7.0
- windows or linux
- python 3.6
- DO-Conv(arxiv2020)(torch>=1.2)
- Attention
- fp_16 training
- Mish
- Custom data
- Data Augment (RandomHorizontalFlip, RandomCrop, RandomAffine, Resize)
- Multi-scale Training (320 to 640)
- focal loss
- CIOU
- Label smooth
- Mixup
- cosine lr
Run the installation script to install all the dependencies. You need to provide the conda install path (e.g. ~/anaconda3) and the name for the created conda environment (here YOLOv4-pytorch
).
pip3 install -r requirements.txt --user
Note: The install script has been tested on an Ubuntu 18.04 and Window 10 system. In case of issues, check the detailed installation instructions.
git clone github.com/argusswift/YOLOv4-pytorch.git
Update the "PROJECT_PATH"
in the config/yolov4_config.py.
# Download the data.
cd $HOME/data
wget http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar
wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtrainval_06-Nov-2007.tar
wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtest_06-Nov-2007.tar
# Extract the data.
tar -xvf VOCtrainval_11-May-2012.tar
tar -xvf VOCtrainval_06-Nov-2007.tar
tar -xvf VOCtest_06-Nov-2007.tar
- Download links:{VOC 2012_trainval 、VOC 2007_trainval、VOC2007_test}、
#step1: download the following data and annotation
2017 Train images [118K/18GB]
2017 Val images [5K/1GB]
2017 Test images [41K/6GB]
2017 Train/Val annotations [241MB]
#step2: arrange the data to the following structure
COCO
---train
---test
---val
---annotations
- Download links:{train2017_img 、train2017_ann 、val2017_img 、val2017_ann 、test2017_img 、test2017_list }
- Put them in the dir, and update the
"DATA_PATH"
in the config/yolov4_config.py. - (for COCO) Use coco_to_voc.py to transfer COCO datatype to VOC datatype.
- Convert data format :use utils/voc.py or utils/coco.py convert the pascal voc *.xml format (COCO *.json format)to *.txt format (Image_path xmin0,ymin0,xmax0,ymax0,class0 xmin1,ymin1,xmax1,ymax1,class1 ...).
- Darknet pre-trained weight : yolov4
- Mobilenet pre-trained weight : mobilenetv2(code:args),mobilenetv3(code:args)
- Make dir
weight/
in the YOLOv4 and put the weight file in. - set MODEL_TYPE in config/yolov4_config.py when you run training program.
- Put pictures of your dataset into the JPEGImages folder, and Annotations files into the Annotations folder.
- Use the xml_to_txt.py file to write the list of training and test files to ImageSets/Main/*.txt.
- Convert data format :use utils/voc.py or utils/coco.py convert the pascal voc *.xml format (COCO *.json format)to *.txt format (Image_path xmin0,ymin0,xmax0,ymax0,class0 xmin1,ymin1,xmax1,ymax1,class1 ...).
Run the following command to start training and see the details in the config/yolov4_config.py
and you should set DATA_TYPE is VOC or COCO when you run training program.
CUDA_VISIBLE_DEVICES=0 nohup python -u train.py --weight_path weight/yolov4.weights --gpu_id 0 > nohup.log 2>&1 &
Also * It supports to resume training adding --resume
, it will load last.pt
automaticly by using commad
CUDA_VISIBLE_DEVICES=0 nohup python -u train.py --weight_path weight/last.pt --gpu_id 0 > nohup.log 2>&1 &
Modify your detecte img path:DATA_TEST=/path/to/your/test_data # your own images
for VOC dataset:
CUDA_VISIBLE_DEVICES=0 python3 eval_voc.py --weight_path weight/best.pt --gpu_id 0 --visiual $DATA_TEST --eval --mode det
for COCO dataset:
CUDA_VISIBLE_DEVICES=0 python3 eval_coco.py --weight_path weight/best.pt --gpu_id 0 --visiual $DATA_TEST --eval --mode det
The images can be seen in the output/
. you could see pictures like follows:
Modify:
- video_path:/path/to/your/video
- weight_path:/path/to/your/weight
- output_dir:/path/to/save/dir
CUDA_VISIBLE_DEVICES=0 python3 video_test.py --weight_path best.pt --gpu_id 0 --video_path video.mp4 --output_dir --output_dir
Modify your evaluate dataset path:DATA_PATH=/path/to/your/test_data # your own images
for VOC dataset:
CUDA_VISIBLE_DEVICES=0 python3 eval_voc.py --weight_path weight/best.pt --gpu_id 0 --visiual $DATA_TEST --eval --mode val
If you want to see the picture above, you should use follow commands:
# To get ground truths of your dataset
python3 utils/get_gt_txt.py
# To plot P-R curve and calculate mean average precision
python3 utils/get_map.py
Modify your evaluate dataset path:DATA_PATH=/path/to/your/test_data # your own images
CUDA_VISIBLE_DEVICES=0 python3 eval_coco.py --weight_path weight/best.pt --gpu_id 0 --visiual $DATA_TEST --eval --mode val
type=bbox
Running per image evaluation... DONE (t=0.34s).
Accumulating evaluation results... DONE (t=0.08s).
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.438
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.607
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.469
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.253
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.486
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.567
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.342
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.571
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.632
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.458
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.691
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.790
python3 utils/modelsize.py
Set showatt=Ture in val_voc.py and you will see the heatmaps emerged from network' output
for VOC dataset:
CUDA_VISIBLE_DEVICES=0 python3 eval_voc.py --weight_path weight/best.pt --gpu_id 0 --visiual $DATA_TEST --eval
for COCO dataset:
CUDA_VISIBLE_DEVICES=0 python3 eval_coco.py --weight_path weight/best.pt --gpu_id 0 --visiual $DATA_TEST --eval
The heatmaps can be seen in the output/
like this: