Modanet-DeeplabV3-MobilenetV2-Tensorflow

Training tf based DeeplabV3 - MobilenetV2 model on the modanet dataset.

Prerequisite

python >= 3.6
tensorflow >= 1.15.0

Installation

Install tensorflow models following the official tutorial.
Fetch the source code and head to the project directory.

git clone https://github.com/McDo/Modanet-DeeplabV3-MobilenetV2-Tensorflow.git
cd ./Modanet-DeeplabV3-MobilenetV2-Tensorflow

Data Preparation

ModaNet is a granularly annotated street fashion dataset based on images in the PaperDoll image set. It has 55,176 images with pixel-level segments, polygons and bounding boxes covering 13 categories.

This project uses the refined version of ModaNet that fixed the bounding box overlapping issue by Pier Carlo Cadoppi, you can download the dataset from cad0p/maskrcnn-modanet.

Here is the recommended directory structure for training and validation:

+ /PATH/TO/MODANET_DATASET
    + images
        + train
        + val
    + annotations
        - instances_train.json
        - instances_val.json

Convert from coco format data to voc format for deeplab training

conversion

cd deeplab/datasets/coco2voc
python main.py --dataset /PATH/TO/MODANET_DATASET

The generated VOC format data would reside in /PATH/TO/MODANET_DATASET/voc.

remove gt colormap

python remove_gt_colormap.py --dataset /PATH/TO/MODANET_DATASET

display samples from train/val dataset

python sample.py --dataset /PATH/TO/MODANET_DATASET --mode train

Build tf-record data

beforehand, you need to copy(move) all of your images inside /PATH/TO/MODANET_DATASET/train and /PATH/TO/MODANET_DATASET/val to the /PATH/TO/MODANET_DATASET/voc/JPEGImages.
build tf-record

cd deeplab/datasets

python build_modanet_data.py \
	--image_folder="/PATH/TO/MODANET_DATASET/voc/JPEGImages" \
	--semantic_segmentation_folder="/PATH/TO/MODANET_DATASET/voc/SegmentationClassRaw" \
	--list_folder="/PATH/TO/MODANET_DATASET/voc/ImageSets/Segmentation" \
	--image_format="jpg" \
	--output_dir="/PATH/TO/TFRECORD_DIR"

Register the modanet dataset

There are some minor changes to the deeplab/datasets/data_generator.py for dataset registration.

# line 111:
_MODANET_SEG_INFORMATION = DatasetDescriptor(
    splits_to_sizes={
        'train': 52377,   # num of samples in images/training
        'val': 2799,      # num of samples in images/validation
        'trainval': 55176 # num of samples train+val
    },
    num_classes=14, # 13 classes + background
    ignore_label=255
)

# line 120:
_DATASETS_INFORMATION = {
    'cityscapes': _CITYSCAPES_INFORMATION,
    'pascal_voc_seg': _PASCAL_VOC_SEG_INFORMATION,
    'ade20k': _ADE20K_INFORMATION,
    'modanet_seg': _MODANET_SEG_INFORMATION,
}

Fill out the number of train, val and trainval for your own needs.

Training Process

cd deeplab

Training

python train.py \
    --logtostderr \
    --training_number_of_steps=30000 \
    --train_split="train" \
    --model_variant="mobilenet_v2" \
    --train_crop_size="513,513" \
    --train_batch_size=8 \
    --dataset="modanet_seg" \
    --fine_tune_batch_norm=True \
    --tf_initial_checkpoint=./deeplabv3_mnv2_pascal_trainval_2018_01_29/model.ckpt \
    --train_logdir=./train_logdir \
    --dataset_dir=/PATH/TO/TFRECORD_DIR \
    --initialize_last_layer=False \
    --last_layers_contain_logits_only=False

Validation

python eval.py \
    --logtostderr \
    --eval_split="val" \
    --model_variant="mobilenet_v2" \
    --eval_crop_size="601,401" \
    --dataset="modanet_seg" \
    --output_stride=8 \
    --checkpoint_dir=./deeplabv3_mnv2_pascal_trainval_2018_01_29/trained \
    --eval_logdir=./eval_logdir \
    --dataset_dir=/PATH/TO/TFRECORD_DIR

NOTE

i. For mobilenetv2_dm05 model, change the depth_multiplier FLAG to 0.5 in common.py, else change it back to 1.0.

ii. If custom dataset is used for training but want to reuse the pre-trained feature encoder, try adding

--initialize_last_layer=False
--last_layers_contain_logits_only=False

iii. When fine_tune_batch_norm=True, use at least batch size larger than 12 (batch size more than 16 is better). Otherwise, one could use smaller batch size and set fine_tune_batch_norm=False.

iv. When running python train.py in colab, using !python instead of %%bash python, otherwise the notebook wouldn't print anything out.

v. We always set crop_size = output_stride * k + 1, where k is an integer. When working on PASCAL images, the largest dimension is 512. Thus, we set crop_size = 513 = 16 * 32 + 1 > 512. Similarly, we set eval_crop_size = 1025x2049 for Cityscapes images.

Results

You may visualize the validation results by running

python vis.py \
    --logtostderr \
    --eval_split="val" \
    --model_variant="mobilenet_v2" \
    --vis_crop_size="601,401" \
    --dataset="modanet_seg" \
    --output_stride=8 \
    --checkpoint_dir="./deeplabv3_mnv2_pascal_trainval_2018_01_29/trained" \
    --vis_logdir=./vis_logdir \
    --dataset_dir=/PATH/TO/TFRECORD_DIR \
    --max_number_of_evaluations=1

Since the deeplab with mobilenetv2 backbone doesn't use ASPP and Decoder as the postprocessing (check out the model zoo for details), the MIOU is relative low compared to the full version. Here are some samples from the visualization results.

References

data source - cad0p/maskrcnn-modanet
coco to voc - alicranck/coco2voc
deeplab V3 - tensorflow deeplab

License

MIT

bailianfa/Modanet-DeeplabV3-MobilenetV2-Tensorflow