A new and strong YOLO family

Recently, I rebuild my YOLO-Family project !!

Requirements

  • We recommend you to use Anaconda to create a conda environment:
conda create -n yolo python=3.6
  • Then, activate the environment:
conda activate yolo
  • Requirements:
pip install -r requirements.txt 

PyTorch >= 1.1.0 and Torchvision >= 0.3.0

Visualize positive samples

You can run following command to visualize positiva sample:

python train.py \
        -d voc \
        --root path/to/your/dataset \
        -m yolov2 \
        --batch_size 2 \
        --vis_targets

Come soon

My better YOLO family

This project

In this project, you can enjoy:

  • a new and stronger YOLOv1
  • a new and stronger YOLOv2
  • a stronger YOLOv3
  • a stronger YOLOv3 with SPP
  • a stronger YOLOv3 with DilatedEncoder
  • YOLOv4 (I'm trying to make it better)
  • YOLO-Tiny
  • YOLO-Nano

Future work

  • Try to make my YOLO-v4 better.
  • Train my YOLOv1/YOLOv2 with ViT-Base (pretrained by MaskAutoencoder)

Weights

You can download all weights including my DarkNet-53, CSPDarkNet-53, MAE-ViT and YOLO weights from the following links.

Backbone

YOLO

Experiments

Tricks

Tricks in this project:

  • Augmentations: Flip + Color jitter + RandomCrop
  • Model EMA
  • Mosaic Augmentation
  • Multi Scale training
  • Gradient accumulation
  • MixUp Augmentation
  • Cosine annealing learning schedule
  • AdamW
  • Scale loss by number of positive samples

Experiments

All experiment results are evaluated on COCO val. All FPS results except YOLO-Nano's are measured on a 2080ti GPU. We will measure the speed of YOLO-Nano on a CPU.

YOLOv1

FPS AP AP50 AP75 APs APm APl GFLOPs Params
YOLOv1-320 151 25.4 41.5 26.0 4.2 25.0 49.8 10.49 44.54M
YOLOv1-416 128 30.1 47.8 30.9 7.8 31.9 53.3 17.73 44.54M
YOLOv1-512 114 33.1 52.2 34.0 10.8 35.9 54.9 26.85 44.54M
YOLOv1-640 75 35.2 54.7 37.1 14.3 39.5 53.4 41.96 44.54M
YOLOv1-800 65.56 44.54M

YOLOv2

FPS AP AP50 AP75 APs APm APl GFLOPs Params
YOLOv2-320 147 26.8 44.1 27.1 4.7 27.6 50.8 10.53 44.89M
YOLOv2-416 123 31.6 50.3 32.4 9.1 33.8 54.0 17.79 44.89M
YOLOv2-512 108 34.3 54.0 35.4 12.3 37.8 55.2 26.94 44.89M
YOLOv2-640 73 36.3 56.6 37.7 15.1 41.1 54.0 42.10 44.89M
YOLOv2-800 65.78 44.89M

YOLOv3

FPS AP AP50 AP75 APs APm APl GFLOPs Params
YOLOv3-320 111 30.8 50.3 31.8 10.0 33.1 50.0 19.57 61.97M
YOLOv3-416 89 34.8 55.8 36.1 14.6 37.5 52.9 33.08 61.97M
YOLOv3-512 77 36.9 58.1 39.3 18.0 40.3 52.2 50.11 61.97M
YOLOv3-608 51 37.0 58.9 39.3 20.5 41.2 49.0 70.66 61.97M
YOLOv3-640 49 36.9 59.0 39.7 21.6 41.6 47.7 78.30 61.97M

YOLOv3 with SPP

FPS AP AP50 AP75 APs APm APl GFLOPs Params
YOLOv3-SPP-320 110 31.0 50.8 32.0 10.5 33.0 50.4 19.68 63.02M
YOLOv3-SPP-416 88 35.0 56.1 36.4 14.9 37.7 52.8 33.26 63.02M
YOLOv3-SPP-512 75 37.2 58.7 39.1 19.1 40.0 53.0 50.38 63.02M
YOLOv3-SPP-608 50 38.3 60.1 40.7 20.9 41.1 51.2 71.04 63.02M
YOLOv3-SPP-640 48 38.2 60.1 40.4 21.6 41.1 50.5 78.72 63.02M

YOLOv3 with Dilated Encoder

The DilatedEncoder is proposed by YOLOF.

FPS AP AP50 AP75 APs APm APl GFLOPs Params
YOLOv3-DE-320 109 31.1 51.1 31.7 10.2 32.6 51.2 19.10 57.25M
YOLOv3-DE-416 88 35.0 56.1 36.3 14.6 37.4 53.7 32.28 57.25M
YOLOv3-DE-512 74 37.7 59.3 39.6 17.9 40.4 54.4 48.90 57.25M
YOLOv3-DE-608 50 38.7 60.5 40.8 20.6 41.7 53.1 68.96 57.25M
YOLOv3-DE-640 48 38.7 60.2 40.7 21.3 41.7 51.7 76.41 57.25M

YOLOv4

I'm still trying to make it better.

FPS AP AP50 AP75 APs APm APl GFLOPs Params
YOLOv4-320 89 39.2 58.6 40.9 16.9 44.1 59.2 16.38 58.14M
YOLOv4-416 84 41.7 61.6 44.2 22.0 46.6 57.7 27.69 58.14M
YOLOv4-512 70 42.9 63.1 46.1 24.5 48.3 56.5 41.94 58.14M
YOLOv4-608 51 43.0 63.4 46.1 26.7 48.6 53.9 59.14 58.14M

YOLO-Tiny

FPS AP AP50 AP75 APs APm APl GFLOPs Params
YOLO-Tiny-320 143 26.4 44.5 26.8 8.8 28.2 42.4 2.17 7.66M
YOLO-Tiny-416 130 28.2 47.6 28.8 11.6 31.5 41.4 3.67 7.82M
YOLO-Tiny-512 118 28.8 48.6 29.4 13.3 33.4 38.3 5.57 7.82M

YOLO-Nano

The FPS is measured on i5-1135G& CPU. Any accelerated deployments that would help speed up detection are not done.

FPS AP AP50 AP75 APs APm APl GFLOPs Params
YOLO-Nano-320 25 17.2 32.9 15.8 3.5 15.7 31.4 0.64 1.86M
YOLO-Nano-416 15 20.2 37.7 19.3 5.5 19.7 33.5 0.99 1.86M
YOLO-Nano-512 10 21.6 40.0 20.5 7.4 22.7 32.3 1.65 1.86M

I am currently not very satisfied with this YOLO-Nano, and will continue to optimize it later.

YOLO-TR

FPS AP AP50 AP75 APs APm APl GFLOPs Params
YOLO-TR-224
YOLO-TR-320
YOLO-TR-384

Dataset

VOC Dataset

I copy the download files from the following excellent project: https://github.com/amdegroot/ssd.pytorch

I have uploaded the VOC2007 and VOC2012 to BaiDuYunDisk, so for researchers in China, you can download them from BaiDuYunDisk:

Link:https://pan.baidu.com/s/1tYPGCYGyC0wjpC97H-zzMQ

Password:4la9

You will get a VOCdevkit.zip, then what you need to do is just to unzip it and put it into data/. After that, the whole path to VOC dataset is data/VOCdevkit/VOC2007 and data/VOCdevkit/VOC2012.

Download VOC2007 trainval & test

# specify a directory for dataset to be downloaded into, else default is ~/data/
sh data/scripts/VOC2007.sh # <directory>

Download VOC2012 trainval

# specify a directory for dataset to be downloaded into, else default is ~/data/
sh data/scripts/VOC2012.sh # <directory>

MSCOCO Dataset

Download MSCOCO 2017 dataset

Just run sh data/scripts/COCO2017.sh. You will get COCO train2017, val2017, test2017.

Train

For example:

python train.py --cuda \
                -d coco \
                -m yolov1 \
                -ms \
                --ema \
                --batch_size 16 \
                --root path/to/dataset/

You can run python train.py -h to check all optional argument. Or you can just run the shell file, for example:

sh train_yolov1.sh

If you have multi gpus like 8, and you put 4 images on each gpu:

python -m torch.distributed.launch --nproc_per_node=8 train.py -d coco \
                                                               --cuda \
                                                               -m yolov1 \
                                                               -ms \
                                                               --ema \
                                                               -dist \
                                                               --sybn \
                                                               --num_gpu 8 \
                                                               --batch_size 4 \
                                                               --root path/to/dataset/

Attention, --batch_size is the number of batchsize on per GPU, not all GPUs.

I have upload all training log files. For example, 1-v1.txt contains all the output information during the training YOLOv1.

It is strongly recommended that you open the training shell file to check how I train each YOLO detector.

Test

For example:

python test.py -d coco \
               --cuda \
               -m yolov4 \
               --weight path/to/weight \
               --img_size 640 \
               --root path/to/dataset/ \
               --show

Evaluation

For example

python eval.py -d coco-val \
               --cuda \
               -m yolov1 \
               --weight path/to/weight \
               --img_size 640 \
               --root path/to/dataset/

Evaluation on COCO-test-dev

To run on COCO_test-dev(You must be sure that you have downloaded test2017):

python eval.py -d coco-test \
               --cuda \
               -m yolov1 \
               --weight path/to/weight \
               --img_size 640 \
                --root path/to/dataset/

You will get a coco_test-dev.json file. Then you should follow the official requirements to compress it into zip format and upload it the official evaluation server.