/deeplabv3plus_on_Mapillary_Vistas

Semantic Segmentation on the Mapillary Vistas Dataset

Primary LanguageJupyter Notebook

Semantic Segmentation on the Mapillary Vistas Dataset using the DeepLabv3+ [4] model by Google TensorFlow

This is a repository for Stanford CS231N course project (spring 2018)

Contact: Sheng Li (parachutel_), available via lisheng@stanford.edu.

The Mapillary Vistas Dataset is available for academic use at here (by request).

To build the dataset, put images in /datasets/mvd/mvd_raw/JPEGImages/, put ground truth labels in /datasets/mvd/mvd_raw/SegmentationClass/, put dataset split filename lists (text files) in /datasets/mvd/mvd_raw/ImageSets/Segmentation/. /datasets/mvd/mvd_raw/ImageSets/Segmentation/build_image_sets.py can help you build the dataset split list files. You will need to update _MVD_INFORMATION in /datasets/segmentation_dataset.py after building your dataset.

To preprocess the dataset and generate tfrecord files for faster reading, please run /datasets/convert_mvd.sh.

The initial model checkpoints are available in the TensorFlow DeepLab Model Zoo . Please put the ones you wish to use in /datasets/mvd/init_models/.

To run train, evaluate and visualize prediction using the model, use the following commands by running local_test_mvd.sh (you may comment out the parts you do not wish to run):

Train:

python "${WORK_DIR}"/train.py \
  --logtostderr \
  --num_clones=4 \
  --train_split="train" \
  --model_variant="xception_65" \
  --atrous_rates=6 \
  --atrous_rates=12 \
  --atrous_rates=18 \
  --output_stride=16 \
  --decoder_output_stride=4 \
  --train_crop_size=513 \
  --train_crop_size=513 \
  --train_batch_size=16 \
  --base_learning_rate=0.0025 \
  --learning_rate_decay_step=500 \
  --weight_decay=0.000015 \
  --training_number_of_steps="${NUM_ITERATIONS}" \
  --log_steps=1 \
  --save_summaries_secs=60 \
  --fine_tune_batch_norm=true \
  --tf_initial_checkpoint="${INIT_FOLDER}/deeplabv3_cityscapes_train/model.ckpt" \
  --initialize_last_layer=false \
  --train_logdir="${TRAIN_LOGDIR}" \
  --dataset_dir="${MVD_DATASET}"

Default value of --dataset is modified inside train.py directly. Batch size and train_crop_size depends on your device's available memory.

Evaluation model:

python "${WORK_DIR}"/eval.py \
  --logtostderr \
  --eval_split="val" \
  --model_variant="xception_65" \
  --atrous_rates=6 \
  --atrous_rates=12 \
  --atrous_rates=18 \
  --output_stride=16 \
  --decoder_output_stride=4 \
  --eval_crop_size=<MAX_HEIGHT_PLUS> \
  --eval_crop_size=<MAX_WIDTH_PLUS> \
  --checkpoint_dir="${TRAIN_LOGDIR}" \
  --eval_logdir="${EVAL_LOGDIR}" \
  --dataset_dir="${MVD_DATASET}" \
  --max_number_of_evaluations=1

Visaulize the prediction:

python "${WORK_DIR}"/vis.py \
  --logtostderr \
  --vis_split="val" \
  --model_variant="xception_65" \
  --atrous_rates=6 \
  --atrous_rates=12 \
  --atrous_rates=18 \
  --output_stride=16 \
  --decoder_output_stride=4 \
  --vis_crop_size=<MAX_HEIGHT_PLUS> \
  --vis_crop_size=<MAX_WIDTH_PLUS> \
  --checkpoint_dir="${TRAIN_LOGDIR}" \
  --vis_logdir="${VIS_LOGDIR}" \
  --dataset_dir="${MVD_DATASET}" \
  --max_number_of_iterations=1

Note: <MAX_HEIGHT_PLUS> and <MAX_WIDTH_PLUS> depends on the maximum resolution of your dataset. The following should be satisfied: <MAX_HEIGHT_PLUS> = output_stride * k + 1. The default value, 513, is set for PASCAL images whose largest image dimension is 512. We pick k = 32, resulting in eval_crop_size = 16 * 32 + 1 = 513 > 512. Same for <MAX_WIDTH_PLUS>.

============================================================================

Original Documentation by Google TensorFlow DeepLab Developers:

DeepLab: Deep Labelling for Semantic Image Segmentation

DeepLab is a state-of-art deep learning model for semantic image segmentation, where the goal is to assign semantic labels (e.g., person, dog, cat and so on) to every pixel in the input image. Current implementation includes the following features:

  1. DeepLabv1 [1]: We use atrous convolution to explicitly control the resolution at which feature responses are computed within Deep Convolutional Neural Networks.

  2. DeepLabv2 [2]: We use atrous spatial pyramid pooling (ASPP) to robustly segment objects at multiple scales with filters at multiple sampling rates and effective fields-of-views.

  3. DeepLabv3 [3]: We augment the ASPP module with image-level feature [5, 6] to capture longer range information. We also include batch normalization [7] parameters to facilitate the training. In particular, we applying atrous convolution to extract output features at different output strides during training and evaluation, which efficiently enables training BN at output stride = 16 and attains a high performance at output stride = 8 during evaluation.

  4. DeepLabv3+ [4]: We extend DeepLabv3 to include a simple yet effective decoder module to refine the segmentation results especially along object boundaries. Furthermore, in this encoder-decoder structure one can arbitrarily control the resolution of extracted encoder features by atrous convolution to trade-off precision and runtime.

If you find the code useful for your research, please consider citing our latest works:

  • DeepLabv3+:
@article{deeplabv3plus2018,
  title={Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation},
  author={Liang-Chieh Chen and Yukun Zhu and George Papandreou and Florian Schroff and Hartwig Adam},
  journal={arXiv:1802.02611},
  year={2018}
}
  • MobileNetv2:
@inproceedings{mobilenetv22018,
  title={Inverted Residuals and Linear Bottlenecks: Mobile Networks for Classification, Detection and Segmentation},
  author={Mark Sandler and Andrew Howard and Menglong Zhu and Andrey Zhmoginov and Liang-Chieh Chen},
  booktitle={CVPR},
  year={2018}
}

In the current implementation, we support adopting the following network backbones:

  1. MobileNetv2 [8]: A fast network structure designed for mobile devices.

  2. Xception [9, 10]: A powerful network structure intended for server-side deployment.

This directory contains our TensorFlow [11] implementation. We provide codes allowing users to train the model, evaluate results in terms of mIOU (mean intersection-over-union), and visualize segmentation results. We use PASCAL VOC 2012 [12] and Cityscapes [13] semantic segmentation benchmarks as an example in the code.

Some segmentation results on Flickr images:




Contacts (Maintainers)

Tables of Contents

Demo:

Running:

Models:

Misc:

  • Please check FAQ if you have some questions before reporting the issues.

Getting Help

To get help with issues you may encounter while using the DeepLab Tensorflow implementation, create a new question on StackOverflow with the tags "tensorflow" and "deeplab".

Please report bugs (i.e., broken code, not usage questions) to the tensorflow/models GitHub issue tracker, prefixing the issue name with "deeplab".

References

  1. Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs
    Liang-Chieh Chen+, George Papandreou+, Iasonas Kokkinos, Kevin Murphy, Alan L. Yuille (+ equal contribution).
    [link]. In ICLR, 2015.

  2. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs
    Liang-Chieh Chen+, George Papandreou+, Iasonas Kokkinos, Kevin Murphy, and Alan L Yuille (+ equal contribution).
    [link]. TPAMI 2017.

  3. Rethinking Atrous Convolution for Semantic Image Segmentation
    Liang-Chieh Chen, George Papandreou, Florian Schroff, Hartwig Adam.
    [link]. arXiv: 1706.05587, 2017.

  4. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation
    Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, Hartwig Adam. arXiv: 1802.02611.
    [link]. arXiv: 1802.02611, 2018.

  5. ParseNet: Looking Wider to See Better
    Wei Liu, Andrew Rabinovich, Alexander C Berg
    [link]. arXiv:1506.04579, 2015.

  6. Pyramid Scene Parsing Network
    Hengshuang Zhao, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, Jiaya Jia
    [link]. In CVPR, 2017.

  7. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate shift
    Sergey Ioffe, Christian Szegedy
    [link]. In ICML, 2015.

  8. Inverted Residuals and Linear Bottlenecks: Mobile Networks for Classification, Detection and Segmentation
    Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, Liang-Chieh Chen
    [link]. arXiv:1801.04381, 2018.

  9. Xception: Deep Learning with Depthwise Separable Convolutions
    François Chollet
    [link]. In CVPR, 2017.

  10. Deformable Convolutional Networks -- COCO Detection and Segmentation Challenge 2017 Entry
    Haozhi Qi, Zheng Zhang, Bin Xiao, Han Hu, Bowen Cheng, Yichen Wei, Jifeng Dai
    [link]. ICCV COCO Challenge Workshop, 2017.

  11. Tensorflow: Large-Scale Machine Learning on Heterogeneous Distributed Systems
    M. Abadi, A. Agarwal, et al.
    [link]. arXiv:1603.04467, 2016.

  12. The Pascal Visual Object Classes Challenge – A Retrospective,
    Mark Everingham, S. M. Ali Eslami, Luc Van Gool, Christopher K. I. Williams, John Winn, and Andrew Zisserma.
    [link]. IJCV, 2014.

  13. The Cityscapes Dataset for Semantic Urban Scene Understanding
    Cordts, Marius, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, Bernt Schiele.
    [link]. In CVPR, 2016.