Update 2022-01-06

I have always used batchsize to scale loss:

loss = loss.sum() / batch_size

However, recently, I think it is not optimal. To address this issue, I scale loss by the number of total positive samples:

loss = loss.sum() / num_pos

After this optimization, I find some tricks which I used but not work can work now. Therefore, I am trying those tricks to make my YOLO better. Once I complete these optimizations, I will immediately upload the latest weight files.

A new and strong YOLO family

Recently, I rebuild my YOLO-Family project !!

Requirements

  • We recommend you to use Anaconda to create a conda environment:
conda create -n yolo python=3.6
  • Then, activate the environment:
conda activate yolo
  • Requirements:
pip install -r requirements.txt 

PyTorch >= 1.1.0 and Torchvision >= 0.3.0

Come soon

My better YOLO family

This project

In this project, you can enjoy:

  • a new and stronger YOLOv1
  • a new and stronger YOLOv2
  • YOLOv3 with DilatedEncoder
  • YOLOv4 ~ (I'm try to make it better)
  • YOLO-Tiny
  • YOLO-Nano

Future work

  • Try to make my YOLO-v4 better.
  • Train my YOLOv1/YOLOv2 with ViT-Base (pretrained by MaskAutoencoder)

Weights

You can download all weights including my DarkNet-53, CSPDarkNet-53, MAE-ViT and YOLO weights from the following links.

Google Drive

Link: Hold on ...

BaiDuYun Disk

Link:https://pan.baidu.com/s/1Cin9R52wfubD4xZUHHCRjg

Password:aigz

Experiments

Tricks

Tricks in this project:

  • Augmentations: Flip + Color jitter + RandomCrop + Multi-scale
  • Model EMA
  • GIoU
  • Mosaic Augmentation for my YOLOv4
  • Multiple positive samples for my YOLOv4

On the COCO-val:

Backbone Size FPS AP AP50 AP75 APs APm APl GFLOPs Params
YOLO-Nano ShuffleNetv2-1.0x 512 21.6 40.0 20.5 7.4 22.7 32.3 1.65 1.86M
YOLO-Tiny CSPDarkNet-Tiny 512 26.6 46.1 26.9 13.5 30.0 35.0 5.52 7.66M
YOLO-TR ViT-B 384
YOLOv1 ResNet50 640 35.2 54.7 37.1 14.3 39.5 53.4 41.96 44.54M
YOLOv2 ResNet50 640 36.3 56.6 37.7 15.1 41.1 54.0 42.10 44.89M
YOLOv3-DE DarkNet53 640 38.7 60.2 40.7 21.3 41.7 51.7 76.41 57.25M
YOLOv4 CSPDarkNet53 640 40.5 60.4 43.5 24.2 44.8 52.0 60.55 52.00M

The FPS of all YOLO detectors are measured on a one 2080ti GPU with 640 × 640 size.

Visualization

I will upload some visualization results:

YOLO-Nano

FPS AP AP50 AP75 APs APm APl GFLOPs Params
YOLO-Nano-320 17.2 32.9 15.8 3.5 15.7 31.4 0.64 1.86M
YOLO-Nano-416 20.2 37.7 19.3 5.5 19.7 33.5 1.09 1.86M
YOLO-Nano-512 21.6 40.0 20.5 7.4 22.7 32.3 1.65 1.86M

YOLO-Tiny

FPS AP AP50 AP75 APs APm APl GFLOPs Params
YOLO-Tiny-320 24.5 42.4 24.5 8.9 26.1 38.8 2.16 7.66M
YOLO-Tiny-416 25.7 44.4 25.9 11.7 27.8 36.7 3.64 7.66M
YOLO-Tiny-512 26.6 46.1 26.9 13.5 30.0 35.0 5.52 7.66M

YOLO-TR

FPS AP AP50 AP75 APs APm APl
YOLO-TR-224
YOLO-TR-320
YOLO-TR-384

YOLOv1

FPS AP AP50 AP75 APs APm APl
YOLOv1-320 25.4 41.5 26.0 4.2 25.0 49.8
YOLOv1-416 30.1 47.8 30.9 7.8 31.9 53.3
YOLOv1-512 33.1 52.2 34.0 10.8 35.9 54.9
YOLOv1-640 35.2 54.7 37.1 14.3 39.5 53.4

YOLOv2

FPS AP AP50 AP75 APs APm APl
YOLOv2-320 26.8 44.1 27.1 4.7 27.6 50.8
YOLOv2-416 31.6 50.3 32.4 9.1 33.8 54.0
YOLOv2-512 34.3 54.0 35.4 12.3 37.8 55.2
YOLOv2-640 36.3 56.6 37.7 15.1 41.1 54.0

YOLOv3

Coming soon.

FPS AP AP50 AP75 APs APm APl
YOLOv3-320
YOLOv3-416
YOLOv3-512
YOLOv3-608
YOLOv3-640

YOLOv3 with SPP

Coming soon.

FPS AP AP50 AP75 APs APm APl
YOLOv3-SPP-320
YOLOv3-SPP-416
YOLOv3-SPP-512
YOLOv3-SPP-608
YOLOv3-SPP-640

YOLOv3 with Dilated Encoder

The DilatedEncoder is proposed by YOLOF.

FPS AP AP50 AP75 APs APm APl
YOLOv3-320 31.1 51.1 31.7 10.2 32.6 51.2
YOLOv3-416 35.0 56.1 36.3 14.6 37.4 53.7
YOLOv3-512 37.7 59.3 39.6 17.9 40.4 54.4
YOLOv3-640 38.7 60.2 40.7 21.3 41.7 51.7

YOLOv4

Coming soon.

FPS AP AP50 AP75 APs APm APl
YOLOv4-SPP-320
YOLOv4-SPP-416
YOLOv4-SPP-512
YOLOv4-SPP-608
YOLOv4-SPP-640

YOLOv4-exp

This is an experimental model. I am currently further optimizing my YOLOv4, using better CSPDarkNet and better training strategies.

FPS AP AP50 AP75 APs APm APl
YOLOv4-320 36.7 55.4 38.2 15.7 39.9 57.5
YOLOv4-416 39.2 58.6 41.4 20.1 43.3 56.8
YOLOv4-512 40.5 60.1 43.1 22.8 44.5 56.1
YOLOv4-640 40.5 60.4 43.5 24.2 44.8 52.0

Dataset

VOC Dataset

I copy the download files from the following excellent project: https://github.com/amdegroot/ssd.pytorch

I have uploaded the VOC2007 and VOC2012 to BaiDuYunDisk, so for researchers in China, you can download them from BaiDuYunDisk:

Link:https://pan.baidu.com/s/1tYPGCYGyC0wjpC97H-zzMQ

Password:4la9

You will get a VOCdevkit.zip, then what you need to do is just to unzip it and put it into data/. After that, the whole path to VOC dataset is data/VOCdevkit/VOC2007 and data/VOCdevkit/VOC2012.

Download VOC2007 trainval & test

# specify a directory for dataset to be downloaded into, else default is ~/data/
sh data/scripts/VOC2007.sh # <directory>

Download VOC2012 trainval

# specify a directory for dataset to be downloaded into, else default is ~/data/
sh data/scripts/VOC2012.sh # <directory>

MSCOCO Dataset

Download MSCOCO 2017 dataset

Just run sh data/scripts/COCO2017.sh. You will get COCO train2017, val2017, test2017.

Train

For example:

python train.py --cuda \
                -d coco \
                -v yolov1 \
                -ms \
                --ema \
                --batch_size 16 \
                --root path/to/dataset/

You can run python train.py -h to check all optional argument. Or you can just run the shell file, for example:

sh train_yolov1.sh

If you have multi gpus like 8, and you put 4 images on each gpu:

python -m torch.distributed.launch --nproc_per_node=8 train.py -d coco \
                                                               --cuda \
                                                               -v yolov1 \
                                                               -ms \
                                                               --ema \
                                                               -dist \
                                                               --sybn \
                                                               --num_gpu 8 \
                                                               --batch_size 4 \
                                                               --root path/to/dataset/

Attention, --batch_size is the number of batchsize on per GPU, not all GPUs.

I have upload all training log files. For example, 1-v1.txt contains all the output information during the training YOLOv1.

It is strongly recommended that you open the training shell file to check how I train each YOLO detector.

Test

For example:

python test.py -d coco \
               --cuda \
               -v yolov1 \
               --weight path/to/weight \
               --img_size 640 \
               --root path/to/dataset/ \
               --show

Evaluation

For example

python eval.py -d coco-val \
               --cuda \
               -v yolov1 \
               --weight path/to/weight \
               --img_size 640 \
               --root path/to/dataset/

Evaluation on COCO-test-dev

To run on COCO_test-dev(You must be sure that you have downloaded test2017):

python eval.py -d coco-test \
               --cuda \
               -v yolov1 \
               --weight path/to/weight \
               --img_size 640 \
                --root path/to/dataset/

You will get a coco_test-dev.json file. Then you should follow the official requirements to compress it into zip format and upload it the official evaluation server.