vedaseg: A Python repository from Media-Smart

Introduction

vedaseg is an open source semantic segmentation toolbox based on PyTorch.

Features

Modular Design

We decompose the semantic segmentation framework into different components. The flexible and extensible design make it easy to implement a customized semantic segmentation project by combining different modules like building Lego.
Support of several popular frameworks

The toolbox supports several popular semantic segmentation frameworks out of the box, e.g. DeepLabv3+, DeepLabv3, U-Net, PSPNet, FPN, etc.
High efficiency

Multi-GPU data parallelism & distributed training.
Multi-Class/Multi-Label segmentation

We implement multi-class and multi-label segmentation(where a pixel can belong to multiple classes).
Acceleration and deployment

Models can be accelerated and deployed with TensorRT.

License

This project is released under the Apache 2.0 license.

Benchmark and model zoo

Note: All models are trained only on PASCAL VOC 2012 trainaug dataset and evaluated on PASCAL VOC 2012 val dataset.

Architecture	backbone	OS	MS & Flip	mIOU
DeepLabv3plus	ResNet-101	16	True	79.46%
DeepLabv3plus	ResNet-101	16	False	77.90%
DeepLabv3	ResNet-101	16	True	79.22%
DeepLabv3	ResNet101	16	False	77.08%
FPN	ResNet-101	4	True	77.05%
FPN	ResNet-101	4	False	75.64%
PSPNet	ResNet-101	8	True	78.39%
PSPNet	ResNet-101	8	False	77.30%
PSPNet	ResNet_v1c-101	8	True	79.88%
PSPNet	ResNet_v1c-101	8	False	78.85%
U-Net	ResNet-101	1	True	74.58%
U-Net	ResNet-101	1	False	72.59%

OS: Output stride used during evaluation.
MS: Multi-scale inputs during evaluation.
Flip: Adding horizontal flipped inputs during evaluation.
ResNet_v1c: Modified stem from original ResNet, as shown in Figure 2(b) in this paper.

Models above are available in the GoogleDrive.

Installation

Requirements

Linux
Python 3.6+
PyTorch 1.4.0 or higher
CUDA 9.0 or higher

We have tested the following versions of OS and softwares:

OS: Ubuntu 16.04.6 LTS
CUDA: 10.2
PyTorch 1.4.0
Python 3.6.9

Install vedaseg

Create a conda virtual environment and activate it.

conda create -n vedaseg python=3.6.9 -y
conda activate vedaseg

Install PyTorch and torchvision following the official instructions, e.g.,

conda install pytorch torchvision -c pytorch

Clone the vedaseg repository.

git clone https://github.com/Media-Smart/vedaseg.git
cd vedaseg
vedaseg_root=${PWD}

Install dependencies.

pip install -r requirements.txt

Prepare data

VOC data

Download Pascal VOC 2012 and Pascal VOC 2012 augmented (you can get details at Semantic Boundaries Dataset and Benchmark), resulting in 10,582 training images(trainaug), 1,449 validatation images.

cd ${vedaseg_root}
mkdir ${vedaseg_root}/data
cd ${vedaseg_root}/data

wget http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar
wget http://www.eecs.berkeley.edu/Research/Projects/CS/vision/grouping/semantic_contours/benchmark.tgz

tar xf VOCtrainval_11-May-2012.tar
tar xf benchmark.tgz

python ../tools/encode_voc12_aug.py
python ../tools/encode_voc12.py

mkdir VOCdevkit/VOC2012/EncodeSegmentationClass
#cp benchmark_RELEASE/dataset/encode_cls/* VOCdevkit/VOC2012/EncodeSegmentationClass
(cd benchmark_RELEASE/dataset/encode_cls; cp * ${vedaseg_root}/data/VOCdevkit/VOC2012/EncodeSegmentationClass)
#cp VOCdevkit/VOC2012/EncodeSegmentationClassPart/* VOCdevkit/VOC2012/EncodeSegmentationClass
(cd VOCdevkit/VOC2012/EncodeSegmentationClassPart; cp * ${vedaseg_root}/data/VOCdevkit/VOC2012/EncodeSegmentationClass)

comm -23 <(cat benchmark_RELEASE/dataset/{train,val}.txt VOCdevkit/VOC2012/ImageSets/Segmentation/train.txt | sort -u) <(cat VOCdevkit/VOC2012/ImageSets/Segmentation/val.txt | sort -u) > VOCdevkit/VOC2012/ImageSets/Segmentation/trainaug.txt

To avoid tedious operations, you could save the above linux commands as a shell file and execute it.

COCO data

Download the COCO-2017 dataset.

cd ${vedaseg_root}
mkdir ${vedaseg_root}/data
cd ${vedaseg_root}/data
mkdir COCO2017 && cd COCO2017
wget -c http://images.cocodataset.org/zips/train2017.zip
unzip train2017.zip && rm train2017.zip
wget -c http://images.cocodataset.org/zips/val2017.zip
unzip val2017.zip &&  rm val2017.zip
wget -c http://images.cocodataset.org/annotations/annotations_trainval2017.zip
unzip annotations_trainval2017.zip && rm annotations_trainval2017.zip

Folder structure

The folder structure should similar as following:

data
├── COCO2017
│   ├── annotations
│   │   ├── instances_train2017.json
│   │   ├── instances_val2017.json
│   ├── train2017
│   ├── val2017
│── VOCdevkit
│   │   ├── VOC2012
│   │   │   ├── JPEGImages
│   │   │   ├── SegmentationClass
│   │   │   ├── ImageSets
│   │   │   │   ├── Segmentation
│   │   │   │   │   ├── trainaug.txt
│   │   │   │   │   ├── val.txt

Train

Config

Modify configuration files in configs/ according to your needs(e.g. configs/voc_unet.py).

The major configuration difference between single-label and multi-label training lies in: nclasses, multi_label, metricsand criterion. You can take configs/coco_multilabel_unet.py as a reference. Currently, multi-label training is only supported in COCO data format.

Ditributed training

# train pspnet using GPUs with gpu_id 0, 1, 2, 3
./tools/dist_train.sh configs/voc_pspnet.py "0, 1, 2, 3"

Non-distributed training

python tools/train.py configs/voc_unet.py

Snapshots and logs by default will be generated at ${vedaseg_root}/workdir/name_of_config_file(you can specify workdir in config files).

Test

Config

Modify configuration as you wish(e.g. configs/voc_unet.py).

Ditributed testing

# test pspnet using GPUs with gpu_id 0, 1, 2, 3
./tools/dist_test.sh configs/voc_pspnet.py path/to/checkpoint.pth "0, 1, 2, 3"

Non-distributed testing

python tools/test.py configs/voc_unet.py path/to/checkpoint.pth

Inference

Config

Modify configuration as you wish(e.g. configs/voc_unet.py).

# visualize the results in a new window
python tools/inference.py configs/voc_unet.py checkpoint_path image_file_path --show

# save the visualization results in folder which named with image prefix, default under folder './result/'
python tools/inference.py configs/voc_unet.py checkpoint_path image_file_path --out folder_name

Deploy

Convert to ONNX

Firstly, install volksdep following the official instructions.

Then, run the following code to convert PyTorch to ONNX. The input shape format is CxHxW. If you need the ONNX model with dynamic input shape, please add --dynamic_shape in the end.

python tools/torch2onnx.py configs/voc_unet.py weight_path out_path --dummy_input_shape 3,513,513 --opset_version 11

Here are some known issues:

Currently PSPNet model is not supported because of the unsupported operation AdaptiveAvgPool2d.
Default ONNX opset version is 9 and PyTorch Upsample operation is only supported with specified size, nearest mode and align_corners being None. If bilinear mode and align_corners are wanted, please add --opset_version 11 when using torch2onnx.py.

Inference SDK

Firstly, install flexinfer and see the example for details.

Contact

This repository is currently maintained by Yuxin Zou (@YuxinZou), Tianhe Wang(@DarthThomas), Hongxiang Cai (@hxcai), Yichao Xiong (@mileistone).