MLMSNet

Lightweight Multi-Level Multi-Scale Feature Fusion Network for Semantic Segmentation.

Demo video: link

Update

2021.05.13 Support 4 encoder: moiblenet large/small and width-multi 0.75/1.0.
2021.05.13 Support mlmsnetv2, Use depth separable convolution for ASPP and SE block. (mobilenetv3 large backbone, width-multi 1.0) 713x713 input: Flops 14.2209G, Params 3.5739M.
2021.05.30 Support CamVid Dataset. The link to download CamVid for our segmentation method.

Dataset Preparation

Download Cityscapes images from Cityscape website. We need gtFine_trainvaltest.zip (241MB) and leftImg8bit_trainvaltest.zip (11GB) . The leftImg8bit_demoVideo.zip (6.6GB) is optional, which is used to generate the demo segmentation results.
Download Cityscapes Scripts. Use createTrainIdLabelImgs.py to generate the trainIDs from cityscapes json annotations.

link cityscapes dataset to the project dataset folder.

ln -s path/to/cityscapes  path/to/MLMSNet/dataset/

The dataset folder structure is list as follows:

cityscapes -> path/to/cityscapes
├── cityscapes_demo.txt
├── cityscapes_test.txt
├── demoVideo
│   ├── stuttgart_00
│   ├── stuttgart_01
│   └── stuttgart_02
├── fine_train.txt
├── fine_val.txt
├── gtFine
│   ├── test
│   ├── train
│   └── val
└── leftImg8bit
    ├── test
    ├── train
    └── val

cityscapes_demo.txt, cityscapes_test.txt, fine_train.txt ,fine_val.txt is in MLMSNet/misc

Train

cd path/to/MLMSNet

export PYTHONPATH=./

python tool/train.py --config config/cityscapes_ohem_large.yaml  2>&1 | tee ohem_largetrain.log

This is the training result using the default configuration parameters, corresponding to config/cityscapes_ohem_large.yaml:

INFO:main-logger:Val result: mIoU/mAcc/allAcc 0.6695/0.7546/0.9541.

Evaluation

cd path/to/MLMSNet

export PYTHONPATH=./

python tool/test.py --config config/cityscapes_ohem_large.yaml  2>&1 | tee ohem_large_test.log

This is the evaluation result using the default configuration parameters, corresponding to config/cityscapes_ohem_large.yaml:

Eval result: mIoU/mAcc/allAcc 0.7268/0.7991/0.9540.

Pretrained Model

You can download pretrained models from Google Drive.

Cityscapes

Model	val mIoU/mAcc/allAcc	config	link
MLMS-L	0.7268/0.7991/0.9540	cityscapes_ohem_large.yaml	MLMS_L
MLMS-S	0.7274/0.8033/0.9537	cityscapes_ohem_small.yaml	MLMS_S
MLMSv2-L	0.7164/0.7982/0.9526	mlmsv2_large.yaml	MLMSv2_L

CamVid

Model	val mIoU/mAcc/allAcc	config	link
camvid-mlms-l	0.6814/0.7574/0.9196	camvid_ohem_large.yaml	camvid_mlms_l
camvid-mlms-s	0.6790/0.7612/0.9188	camvid_ohem_small.yaml	camvid_mlms_s

Reference

The codebase is from semseg: Semantic Segmentation in Pytorch:

@misc{semseg2019,
  author={Zhao, Hengshuang},
  title={semseg},
  howpublished={\url{https://github.com/hszhao/semseg}},
  year={2019}
}
@inproceedings{zhao2017pspnet,
  title={Pyramid Scene Parsing Network},
  author={Zhao, Hengshuang and Shi, Jianping and Qi, Xiaojuan and Wang, Xiaogang and Jia, Jiaya},
  booktitle={CVPR},
  year={2017}
}
@inproceedings{zhao2018psanet,
  title={{PSANet}: Point-wise Spatial Attention Network for Scene Parsing},
  author={Zhao, Hengshuang and Zhang, Yi and Liu, Shu and Shi, Jianping and Loy, Chen Change and Lin, Dahua and Jia, Jiaya},
  booktitle={ECCV},
  year={2018}
}

The MobileNetv3 code and pretrained model is from mobilenetv3.pytorch: 74.3% MobileNetV3-Large and 67.2% MobileNetV3-Small model on ImageNet
This project gave me a better understanding of the loss function related to the segmentation field: SegLoss

lih627/MLMSNet