Mobilenetv2-Yolov3

Tensorflow implementation mobilenetv2-yolov3 and efficientnet-yolov3 inspired by keras-yolo3

Update

Backend:

MobilenetV2
Efficientnet
Darknet53

Callback:

mAP
Tensorboard extern callback

Loss:

MSE
GIOU

Train:

Multi scale image size
Cosine learning rate

Tensorflow:

Serving:

Tensorflow Serving warm up request
Tensorflow Serving JAVA Client
Tensorflow Serving Python Client
Tensorflow Serving Service Control Client
Tensorflow Serving Server Build and Plugins develop

Usage

Install:

pip install -r requirements.txt

Get help info:

python main.py --help

Train:

Format file name like [name]_[number].[extension]
Example:

voc_train_3998.txt

2. If you are using txt dataset, please format records like [image_path] [,[xmin ymin xmax ymax class]]
(for convenience, you can modify voc_text.py to parse your data to specific data format), else you should modify voc_annotation.py, then run

python voc_annotation.py

to parse your data to tfrecords.
Example:

/image/path 179 66 272 290 14 172 38 317 349 14 276 2 426 252 14 1 32 498 365 13

3. Run:

python main.py --mode=TRAIN --train_dataset_glob=<your dataset glob>

Predict:

python main.py --mode=IMAGE --model=<your_model_path>

Export serving model:

python main.py --mode=SERVING --model=<your_model_path>

Use custom config file:

python main.py --config=mobilenetv2.yaml

Set up tensorflow.js model (Live Demo: https://fsx950223.github.io/mobilenetv2-yolov3/tfjs/)

Create a web server on project folder
Open browser and enter [your_url:your_port]/tfjs

Resources

Download pascal tfrecords from here.
Download pre-trained mobilenetv2-yolov3 model(VOC2007) here
Download pre-trained efficientnet-yolov3 model(VOC2007) here
Download pre-trained efficientnet-yolov3 model(VOC2007+2012) here

Performance

Network: Mobilenetv2+Yolov3
Input size: 416*416
Train Dataset: VOC2007
Test Dataset: VOC2007
mAP:

aeroplane ap:  0.585270123970341
bicycle ap:  0.7311717479746895
bird ap:  0.6228634475289679
boat ap:  0.44729361226611786
bottle ap:  0.3524265151288485
bus ap:  0.7260233058709467
car ap:  0.7572503412774444
cat ap:  0.8443930169586521
chair ap:  0.3530240979604032
cow ap:  0.5680746465428056
diningtable ap:  0.6046673143934721
dog ap:  0.8096497542858805
horse ap:  0.785358647511358
motorbike ap:  0.7299038925396009
person ap:  0.6926967393665762
pottedplant ap:  0.2960290730045794
sheep ap:  0.5569735405574012
sofa ap:  0.6053534702293342
train ap:  0.7304618425853895
tvmonitor ap:  0.5983913977616169
mAP:  0.6198638263857212

GPU inference time (GTX1080Ti): 19ms
CPU inference time (i7-8550U): 112ms
Model size: 37M

Network: Efficientnet+Yolov3
Input size: 380*380
Train Dataset: VOC2007
Test Dataset: VOC2007
mAP:

aeroplane ap:  0.6492260838166934
bicycle ap:  0.8010712280076165
bird ap:  0.7013865117634108
boat ap:  0.5557173155813903
bottle ap:  0.4353563564340365
bus ap:  0.753804699972881
car ap:  0.7878183961387922
cat ap:  0.8632726491920759
chair ap:  0.4090719340574334
cow ap:  0.6657089830054761
diningtable ap:  0.6513494390619038
dog ap:  0.8466486584164448
horse ap:  0.8328765157511936
motorbike ap:  0.7607912651726462
person ap:  0.7089970516297166
pottedplant ap:  0.32875322571854027
sheep ap:  0.6372370950276296
sofa ap:  0.675301446703759
train ap:  0.7734685594308568
tvmonitor ap:  0.6505409659737674
mAP:  0.6744199190428132

GPU inference time (GTX1080Ti): 23ms
CPU inference time (i7-8550U): 168ms
Model size: 77M

Network: Efficientnet+Yolov3
Input size: 380*380
Train Dataset: VOC2007+VOC2012
Test Dataset: VOC2007
mAP:

aeroplane ap:  0.8186380791530123
bicycle ap:  0.778370501901752
bird ap:  0.8040658409051149
boat ap:  0.6606796907615438
bottle ap:  0.5338128542328597
bus ap:  0.8516086793836817
car ap:  0.8247881435224634
cat ap:  0.9271784386863242
chair ap:  0.5344565229671414
cow ap:  0.7724057669182698
diningtable ap:  0.701598520527006
dog ap:  0.9052246177009002
horse ap:  0.8477206181813397
motorbike ap:  0.8275932123398402
person ap:  0.7605203479510053
pottedplant ap:  0.45979410517062475
sheep ap:  0.8301611044152797
sofa ap:  0.7393617389123919
train ap:  0.8817430073959469
tvmonitor ap:  0.6981047903116634
mAP:  0.757891329066908

GPU inference time (GTX1080Ti): 23ms
CPU inference time (i7-8550U): 168ms
Model size: 77M

Reference

paper:

hejinIzhanghao/mobilenetv2-yolov3

Mobilenetv2-Yolov3

Update

Usage

Install:

Get help info:

Train:

Predict:

Export serving model:

Use custom config file:

Set up tensorflow.js model (Live Demo: https://fsx950223.github.io/mobilenetv2-yolov3/tfjs/)

Resources

Performance

Reference