/yolo-v1-pytorch

⚗ YOLO v1 PyTorch Implementation

Primary LanguageJupyter NotebookMIT LicenseMIT

YOLO v1 PyTorch Implementation

简体中文 Simplified Chinese

I wrote this repo for the purpose of learning, aimed to reproduce YOLO v1 using PyTorch. It is very hard to pretrain the original network on ImageNet, so I replaced the backbone with ResNet18 and ResNet50 with PyTorch pretrained version for convenience. However, the original network backbone is also defined in yolo.py, and is available for training. Pretraining method is not yet finished (and maybe would never be finished since I've achieved reasonable results using other backbones), and is marked TODO in the file.

Besides, I removed the Dropout layer and added Batch Normalization after every convolution layer according to yolo v2.

The implementation of loss function is exact as the original paper. Also, I adapted all the hyper parameters from the paper, and the network is trained on VOC2007-trainval+test and VOC2012-train, tested on VOC2012-val using RTX2070s.

Here is the structure of the project.

webcam.py                     # webcam demo
utils
├── data.py                   # data pipeline
├── init.py                   # weight initialization
├── metrics.py                # mAP calculation
├── utils.py                  # helper, e.g. Accumulator, Timer
└── visualize.py              # visualization
yolo
├── tests.py                  # test wrapping
└── yolo.py                   # YOLO module, loss, nms implementation

Performance

Model Backbone mAP@VOC2012-val COCOmAP@VOC2012-val FPS
YOLOv1-ResNet18 (Ours) ResNet18 48.10% 23.18% 97.88
YOLOv1-ResNet50 (Ours) ResNet50 49.87% 23.95% 58.40
Model Backbone mAP@VOC2012-test FPS
YOLOv1-ResNet18 (Ours) ResNet18 44.54% 97.88
YOLOv1-ResNet50 (Ours) ResNet50 47.28% 58.40
YOLOv1 Darknet? 57.9% 45

Leaderboard Link:

More comparison across categories:

Model mean aero plane bicycle bird boat bottle bus car cat chair cow
YOLO 57.9 77.0 67.2 57.7 38.3 22.7 68.3 55.9 81.4 36.2 60.8
YOLOv1-ResNet18 (Ours) 44.5 64.3 54.2 47.4 26.8 16.6 55.4 44.3 66.5 23.1 38.1
YOLOv1-ResNet50 (Ours) 47.3 66.7 56.1 49.5 25.9 17.8 60.2 45.9 70.6 26.1 43.0
Model dining
table
dog horse motor
bike
person potted
plant
sheep sofa train tv
monitor
YOLO 48.5 77.2 72.3 71.3 63.5 28.9 52.2 54.8 73.9 50.8
YOLOv1-ResNet18 (Ours) 38.5 62.9 57.6 60.8 45.0 15.2 33.3 43.9 60.0 37.2
YOLOv1-ResNet50 (Ours) 41.1 67.5 59.2 62.4 47.6 17.6 35.6 45.7 64.6 42.4

Honestly the results are not very ideal, but now I am focusing on more modern archs and tricks to improve the results.

Note

When running the notebook for the first time, you should add , download=True param to load_data_voc to download dataset. It is suggested to remove the param after everything's set, since it is time-consuming to unarchive the data every time.

Training

If you want to train the model totally by yourself, use resnet18-yolo-train.ipynb and resnet50-yolo-train.ipynb.

I trained the network using RTX2070s-8GB, so I also implemented gradient accumulation due to OOM problem. The true batch_size is determined by both batch_size of DataLoader and accum_batch_num param from train method. In the case of resnet18-yolo-train.ipynb, batch_size = 16 (dataloader/batch_size) * 4 (accum_batch_num). You can adjust the param according to specific cases. Besides, DataParallel is also supported by specifying num_gpu param of train().

Here are some training loss plot:

ResNet18 (Backbone):

ResNet50 (Backbone):

Testing

Model weight are available in repo release. Place the weights in ./model/ folder, and run resnet18-yolo-test.ipynb and resnet50-yolo-test.ipynb.

Here is also a demo using using webcam (webcam.py).

2022/05/10 Update: According to VOC postscripts, during evaluation, the objects with the tag of "difficult" are excluded, but will not penalize if detected. I missed this statement before, and the good news is that the mAP of both models increased by about 4% now after excluding them.

2022/05/13 Update: voc2012test.py was added to generate VOC2012 test results. Evaluation scores are also published in README. If you want to test it by yourself, please place VOC2012 test data in the current folder like below.

.
yolo-v2-pytorch             # project folder
├── ...                     # Other files
└── README.md
data
└── VOC2012test             # create dataset folder
    └── VOCdevkit
        └── VOC2012
            ├── Annotations
            ├── ImageSets
            └── JPEGImages

VOC2012 test dataset download link:

Thanks

YOLO v2

My YOLO v2 implementation repo: