This repo attempts to reproduce DeepLabv3 in TensorFlow for semantic image segmentation on the PASCAL VOC dataset. The implementation is largely based on DrSleep's DeepLab v2 implemantation and tensorflow models Resnet implementation.
Please install latest version of TensorFlow (r1.6) and use Python 3.
- Download and extract
PASCAL VOC training/validation data
(2GB tar file), specifying the location with the
--data_dir
. - Download and extract
augmented segmentation data
(Thanks to DrSleep), specifying the location with
--data_dir
and--label_data_dir
(namely,$data_dir/$label_data_dir
). - For inference the trained model with
76.42%
mIoU on the Pascal VOC 2012 validation dataset is available here. Download and extract to--model_dir
. - For training, you need to download and extract
pre-trained Resnet v2 101 model
from slim
specifying the location with
--pre_trained_model
.
For training model, you first need to convert original data to the TensorFlow TFRecord format. This enables to accelerate training seep.
python create_pascal_tf_record.py --data_dir DATA_DIR \
--image_data_dir IMAGE_DATA_DIR \
--label_data_dir LABEL_DATA_DIR
Once you created TFrecord for PASCAL VOC training and validation deta, you can start training model as follow:
python train.py --model_dir MODEL_DIR --pre_trained_model PRE_TRAINED_MODEL
Here, --pre_trained_model
contains the pre-trained Resnet model, whereas
--model_dir
contains the trained DeepLabv3 checkpoints.
If --model_dir
contains the valid checkpoints, the model is trained from the
specified checkpoint in --model_dir
.
You can see other options with the following command:
python train.py --help
The training process can be visualized with Tensor Board as follow:
tensorboard --logdir MODEL_DIR
To evaluate how model perform, one can use the following command:
python evaluate.py --help
The current best model build by this implementation achieves 76.42%
mIoU on the Pascal VOC 2012
validation dataset.
Method | OS | mIOU | |
---|---|---|---|
paper | MG(1,2,4)+ASPP(6,12,18)+Image Pooling | 16 | 77.21% |
repo | MG(1,2,4)+ASPP(6,12,18)+Image Pooling | 16 | 76.42% |
Here, the above model was trained about 9.5 hours (with Tesla V100 and r1.6) with following parameters:
python train.py --train_epochs 46 --batch_size 16 --weight_decay 1e-4 --model_dir models/ba=16,wd=1e-4,max_iter=30k --max_iter 30000
You may achieve better performance with the cost of computation with my DeepLabV3+ Implementation.
To apply semantic segmentation to your images, one can use the following commands:
python inference.py --data_dir DATA_DIR --infer_data_list INFER_DATA_LIST --model_dir MODEL_DIR
The trained model is available here. One can find the detailed explanation of mask such as meaning of color in DrSleep's repo.
Pull requests are welcome.
- Freeze batch normalization during training
- Multi-GPU support
- Channels first support (Apparently large performance boost on GPU)
- Model pretrained on MS-COCO
- Unit test
This repo borrows code heavily from