YOLOv3 for Object Detection
- TensorFlow implementation of YOLOv3 for object detection.
- Both inference and training pipelines are implemented.
- For inference using pre-trained model, the model stored in
.weights
file is first downloaded from official YOLO website (Section 'Performance on the COCO Dataset', YOLOv3-416 link), then converted to.npy
file and finally loaded by the TensorFlow model for prediction. - For training, the pre-trained DarkNet-53 is used as the feature extractor and the YOLO prediction layers at three scales are trained from scratch. Data augmentation such as random flipping, cropping, resize, affine transformation and color change (hue, saturation, brightness) are applied. Anchor clustering and multiple scale training (rescale training images every 10 epochs) are implemented as well.
TODO
- Convert pre-trained
.weights
model to.npy
file (detail). - Pre-trained DarkNet-53 for image classification (detail).
- Object detection using pre-trained YOLOv3 trained on COCO dataset (detail).
- YOLOv3 training pipeline
- Train on VOC dataset (detail).
- Performance evaluation.
- Train on custom dataset.
Requirements
- Python 3.0
- TensorFlow 1.12.0+
- Numpy
- Scipy
- imageio
- Matplotlib
Use pre-trained model for object detection (80 classes)
Download pre-trained model
- Download the pre-trained model
yolov3.npy
from here. This model is converted from the.weights
file from here (Section 'Performance on the COCO Dataset', YOLOv3-416 link). - More details for converting models can be found here.
Setup configuration
-
Modified the config file
configs/config_path.cfg
with the following content:[path] coco_pretrained_npy = DIRECTORY/TO/MODEL/yolov3.npy save_path = DIRECTORY/TO/SAVE/RESULT/ test_image_path = DIRECTORY/OF/TEST/IMAGE/ test_image_name = .jpg
- Put the converted pretrained model
yolov3.npy
incoco_pretrained_npy
. - Put testing images in
test_image_path
. - Part of testimg image names is specified by
test_image_name
. - Result images will be saved in
save_path
.
- Put the converted pretrained model
-
Use
obj_score_thresh
andnms_iou_thresh
in config fileconfigs/coco80.cfg
to setup the parameters of non-maximum suppression to remove multiple bounding boxes for one detected object.obj_score_thresh
is the threshold for deciding if a bounding box detects an object class based on the score. Default is0.8
.nms_iou_thresh
is the threshold for deciding if two bounding boxes overlap too much based on the IoU. Default is0.45
.
Prediction
-
Put testing images in
test_image_path
inpretrain_coco_path.cfg
and go toexperiment\
, runpython yolov3.py --detect
-
Testing images are rescaled to 416 * 416 fed into the network.
-
Result images are saved in
save_path
setting inconfigs/pretrain_coco_path.cfg
.
Sample results
Train on VOC2012 dataset (20 classes)
Prepare dataset and pre-trained feature extractor
- Download VOC2012 training/validation data from here (2GB tar file).
- Download the pre-trained Darknet-53
yolov3_feat.npy
from here. This model is converted from the.weights
file from here (Section 'Pre-Trained Models', Darknet53 448x448 link). - More details for converting models can be found here.
Setup configuration
-
Modified the config file
configs/config_path.cfg
with the following content:[path] yolo_feat_pretraind_npy = DIRECTORY/TO/MODEL/yolov3_feat.npy train_data_path = DIRECTORY/OF/TRAINING/SET/ save_path = DIRECTORY/TO/SAVE/RESULT/
- Put the converted pretrained model
yolov3_feat.npy
inyolo_feat_pretraind_npy
. train_data_path
is the parent directoryJPEGImages
andAnnotations
for training/validation set.- Tensorboard summary and trained model will be saved in
save_path
.
- Put the converted pretrained model
-
Use config file
configs/voc.cfg
to setup the hyper-parameters for training on VOC2012. Default values are the current setting.anchor
are the 9 anchors (width and height) obtained from anchor clustering in ascending order.obj_weight
andnobj_weight
are the weights of object loss and non-object loss.multiscale
is the set of scales used for training.
Training
-
Go to
experiment\
, runpython yolov3.py --train
-
The entire dataset is randomly divided into 14556 training samples (85%) and 2568 validation images (15%).
-
Data augmentation (flipping, cropping, resize, affine transformation and color change) is applied to the training set. The training images are rescaled every 10 epochs (randomly picked from
multiscale
inconfigs/voc.cfg
). -
Validation image are all rescaled to 416 * 416 without augmentation for validation.
-
The learning rate schedule needs to be further tuned, but the current setting is: 0.1 (1-50 epochs), 0.01 (51-100 epochs) and 0.001 (101-150 epochs).
-
Tensorboard summary includes losses and sample predictions for both training set (every 100 steps) and validation set (every epoch) are saved in
save_path
inconfigs/config_path.cfg
. Note that non-maximum suppression does not used in sample predictions and only top 20 predicted bounding boxes based on class score are shown. You can see how the model is doing during training:
Sample results
- Prediction after 150 epochs. Performance evaluation will be added soon.
Reference code
- https://github.com/pjreddie/darknet
- https://github.com/experiencor/keras-yolo3
- https://github.com/qqwweee/keras-yolo3
Author
Qian Ge