Training tf based DeeplabV3 - MobilenetV2 model on the modanet dataset.
- python >= 3.6
- tensorflow >= 1.15.0
- Install tensorflow models following the official tutorial.
- Fetch the source code and head to the project directory.
git clone https://github.com/McDo/Modanet-DeeplabV3-MobilenetV2-Tensorflow.git
cd ./Modanet-DeeplabV3-MobilenetV2-Tensorflow
ModaNet is a granularly annotated street fashion dataset based on images in the PaperDoll image set. It has 55,176 images with pixel-level segments, polygons and bounding boxes covering 13 categories.
This project uses the refined version of ModaNet that fixed the bounding box overlapping issue by Pier Carlo Cadoppi, you can download the dataset from cad0p/maskrcnn-modanet.
Here is the recommended directory structure for training and validation:
+ /PATH/TO/MODANET_DATASET
+ images
+ train
+ val
+ annotations
- instances_train.json
- instances_val.json
- conversion
cd deeplab/datasets/coco2voc
python main.py --dataset /PATH/TO/MODANET_DATASET
The generated VOC format data would reside in /PATH/TO/MODANET_DATASET/voc
.
- remove gt colormap
python remove_gt_colormap.py --dataset /PATH/TO/MODANET_DATASET
- display samples from train/val dataset
python sample.py --dataset /PATH/TO/MODANET_DATASET --mode train
-
beforehand, you need to copy(move) all of your images inside
/PATH/TO/MODANET_DATASET/train
and/PATH/TO/MODANET_DATASET/val
to the/PATH/TO/MODANET_DATASET/voc/JPEGImages
. -
build tf-record
cd deeplab/datasets
python build_modanet_data.py \
--image_folder="/PATH/TO/MODANET_DATASET/voc/JPEGImages" \
--semantic_segmentation_folder="/PATH/TO/MODANET_DATASET/voc/SegmentationClassRaw" \
--list_folder="/PATH/TO/MODANET_DATASET/voc/ImageSets/Segmentation" \
--image_format="jpg" \
--output_dir="/PATH/TO/TFRECORD_DIR"
There are some minor changes to the deeplab/datasets/data_generator.py
for dataset registration.
# line 111:
_MODANET_SEG_INFORMATION = DatasetDescriptor(
splits_to_sizes={
'train': 52377, # num of samples in images/training
'val': 2799, # num of samples in images/validation
'trainval': 55176 # num of samples train+val
},
num_classes=14, # 13 classes + background
ignore_label=255
)
# line 120:
_DATASETS_INFORMATION = {
'cityscapes': _CITYSCAPES_INFORMATION,
'pascal_voc_seg': _PASCAL_VOC_SEG_INFORMATION,
'ade20k': _ADE20K_INFORMATION,
'modanet_seg': _MODANET_SEG_INFORMATION,
}
Fill out the number of train
, val
and trainval
for your own needs.
cd deeplab
- Training
python train.py \
--logtostderr \
--training_number_of_steps=30000 \
--train_split="train" \
--model_variant="mobilenet_v2" \
--train_crop_size="513,513" \
--train_batch_size=8 \
--dataset="modanet_seg" \
--fine_tune_batch_norm=True \
--tf_initial_checkpoint=./deeplabv3_mnv2_pascal_trainval_2018_01_29/model.ckpt \
--train_logdir=./train_logdir \
--dataset_dir=/PATH/TO/TFRECORD_DIR \
--initialize_last_layer=False \
--last_layers_contain_logits_only=False
- Validation
python eval.py \
--logtostderr \
--eval_split="val" \
--model_variant="mobilenet_v2" \
--eval_crop_size="601,401" \
--dataset="modanet_seg" \
--output_stride=8 \
--checkpoint_dir=./deeplabv3_mnv2_pascal_trainval_2018_01_29/trained \
--eval_logdir=./eval_logdir \
--dataset_dir=/PATH/TO/TFRECORD_DIR
NOTE
i. For
mobilenetv2_dm05
model, change thedepth_multiplier
FLAG to0.5
incommon.py
, else change it back to 1.0.
ii. If custom dataset is used for training but want to reuse the pre-trained feature encoder, try adding
--initialize_last_layer=False
--last_layers_contain_logits_only=False
iii. When fine_tune_batch_norm=True, use at least batch size larger than 12 (batch size more than 16 is better). Otherwise, one could use smaller batch size and set fine_tune_batch_norm=False.
iv. When running
python train.py
in colab, using!python
instead of%%bash python
, otherwise the notebook wouldn't print anything out.
v. We always set crop_size = output_stride * k + 1, where k is an integer. When working on PASCAL images, the largest dimension is 512. Thus, we set crop_size = 513 = 16 * 32 + 1 > 512. Similarly, we set eval_crop_size = 1025x2049 for Cityscapes images.
You may visualize the validation results by running
python vis.py \
--logtostderr \
--eval_split="val" \
--model_variant="mobilenet_v2" \
--vis_crop_size="601,401" \
--dataset="modanet_seg" \
--output_stride=8 \
--checkpoint_dir="./deeplabv3_mnv2_pascal_trainval_2018_01_29/trained" \
--vis_logdir=./vis_logdir \
--dataset_dir=/PATH/TO/TFRECORD_DIR \
--max_number_of_evaluations=1
Since the deeplab with mobilenetv2 backbone doesn't use ASPP and Decoder as the postprocessing (check out the model zoo for details), the MIOU is relative low compared to the full version. Here are some samples from the visualization results.
- data source - cad0p/maskrcnn-modanet
- coco to voc - alicranck/coco2voc
- deeplab V3 - tensorflow deeplab