DenseASPP for Semantic Segmentation in Street Scenes pdf

Introduction

Semantic image segmentation is a basic street scene understanding task in autonomous driving, where each pixel in a high resolution image is categorized into a set of semantic labels. Unlike other scenarios, objects in autonomous driving scene exhibit very large scale changes, which poses great challenges for high-level feature representation in a sense that multi-scale information must be correctly encoded.

To remedy this problem, atrous convolution[2, 3] was introduced to generate features with larger receptive fields without sacrificing spatial resolution. Built upon atrous convolution, Atrous Spatial Pyramid Pooling (ASPP)[3] was proposed to concatenate multiple atrous-convolved features using different dilation rates into a final feature representation. Although ASPP is able to generate multi-scale features, we argue the feature resolution in the scale-axis is not dense enough for the autonomous driving scenario. To this end, we propose Densely connected Atrous Spatial Pyramid Pooling (DenseASPP), which connects a set of atrous convolutional layers in a dense way, such that it generates multi-scale features that not only cover a larger scale range, but also cover that scale range densely, without significantly increasing the model size. We evaluate DenseASPP on the street scene benchmark Cityscapes[4] and achieve state-of-the-art performance.

Usage

1. Clone the repository:

git clone https://github.com/DeepMotionAIResearch/DenseASPP.git

2. Download pretrained model:

Put the model at the folder weights. We provide some checkpoints to run the code:

DenseNet161 based model: GoogleDrive

Mobilenet v2 based model: Coming soon.

Performance of these checkpoints:

Checkpoint name	Multi-scale inference	Cityscapes mIOU (val)	Cityscapes mIOU (test)	File Size
DenseASPP161	False True	79.9% 80.6 %	- 79.5%	142.7 MB
MobileNetDenseASPP	False True	74.5% 75.0 %	- -	10.2 MB

Please note that the performance of these checkpoints can be further improved by fine-tuning. Besides, these models were trained with Pytorch 0.3.1

3. Inference

First cd to your code root, then run:

 python demo.py  --model_name DenseASPP161 --model_path <your checkpoint path> --img_dir <your img directory>

4. Evaluation the results

Please cd to ./utils, then run:

 python transfer.py

And eval the results with the official evaluation code of Cityscapes, which can be found at there

Issues (corrections)

The model is using an old version of pytorch. The aim of this fork is to deploy code compatible with the new version and to train it obtaining the new parameters.

References

DenseASPP for Semantic Segmentation in Street Scenes
Maoke Yang, Kun Yu, Chi Zhang, Zhiwei Li, Kuiyuan Yang.
link. In CVPR, 2018.
Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs
Liang-Chieh Chen+, George Papandreou+, Iasonas Kokkinos, Kevin Murphy, Alan L. Yuille (+ equal contribution).
link. In ICLR, 2015.
DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs
Liang-Chieh Chen+, George Papandreou+, Iasonas Kokkinos, Kevin Murphy, and Alan L Yuille (+ equal contribution).
link. TPAMI 2017.
The Cityscapes Dataset for Semantic Urban Scene Understanding
Cordts, Marius, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, Bernt Schiele.
link. In CVPR, 2016.

Pietrosanguin/DenseASPP