/RMI

This is the code for the NeurIPS 2019 paper Region Mutual Information Loss for Semantic Segmentation.

Primary LanguagePythonMIT LicenseMIT

Region Mutual Information Loss for Semantic Segmentation

Table of Contents

Introduction

This is the code for the NeurIPS 2019 paper Region Mutual Information Loss for Semantic Segmentation.

This paper proposes a region mutual information (RMI) loss to model the dependencies among pixels. RMI uses one pixel and its neighbor pixels to represent this pixel. Then for each pixel in an image, we get a multi-dimensional point that encodes the relationship between pixels, and the image is cast into a multi-dimensional distribution of these high-dimensional points. The prediction and ground truth thus can achieve high order consistency through maximizing the mutual information (MI) between their multi-dimensional distributions.

img_intro

Features and TODO

  • Support different segmentation models, i.e., DeepLabv3, DeepLabv3+, PSPNet
  • Multi-GPU training
  • Multi-GPU Synchronized BatchNorm
  • Support different backbones, e.g., Mobilenet, Xception
  • Model pretrained on MS-COCO
  • Distributed training

We are open to pull requests.

Installation

Install dependencies

Please install PyTorch-1.1.0 and Python3.6.5. We highly recommend you to use our established PyTorch docker image - zhaosssss/torch_lab.

docker pull zhaosssss/torch_lab:1.1.0

If you have not installed docker, see https://docs.docker.com/.

After you install docker and pull our image, you can cd to script directory and run

./docker.sh

to create a running docker container.

If you do not want to use docker, try

pip install -r requirements.txt

However, this is not suggested.

Prepare data

Generally, directories are organized as follow:

|
|--dataset (save the dataset) 
|--models  (save the output checkpoints)
|--github  (save the code)
|--|
|--|--RMI  (the RMI code repository)
|--|--|--crf
|--|--|--dataloaders
|--|--|--losses
...

As for the CamVid dataset, you can download at SegNet-Tutorial. This is a processed version of original CamVid dataset.

Training

See script/train.sh for detailed information. Before start training, you should specify some variables in the script/train.sh.

  • pre_dir, where you save your output checkpoints. If you organize the dir as we suggest, it should be pre_dir=models.

  • data_dir, where you save your dataset. Besides, you should put the lists of the images in the dataset in a certain directory, check dataloaders/datasets/pascal.py to find how we organize the input pipeline.

You can find more information about the arguments of the code in parser_params.py.

python parser_params.py --help

usage: parser_params.py [-h] [--resume RESUME] [--checkname CHECKNAME]
                        [--save_ckpt_steps SAVE_CKPT_STEPS]
                        [--max_ckpt_nums MAX_CKPT_NUMS]
                        [--model_dir MODEL_DIR] [--output_dir OUTPUT_DIR]
                        [--seg_model {deeplabv3,deeplabv3+,pspnet}]
                        [--backbone {resnet50,resnet101,resnet152,resnet50_beta,resnet101_beta,resnet152_beta}]
                        [--out_stride OUT_STRIDE] [--batch_size N]
                        [--accumulation_steps N] [--test_batch_size N]
                        [--dataset {pascal,coco,cityscapes,camvid}]
                        [--train_split {train,trainaug,trainval,val,test}]
                        [--data_dir DATA_DIR] [--use_sbd] [--workers N]
                        ...
                        [--rmi_pool_size RMI_POOL_SIZE]
                        [--rmi_pool_stride RMI_POOL_STRIDE]
                        [--rmi_radius RMI_RADIUS]
                        [--crf_iter_steps CRF_ITER_STEPS]
                        [--local_rank LOCAL_RANK] [--world_size WORLD_SIZE]
                        [--dist_backend DIST_BACKEND]
                        [--multiprocessing_distributed]

After you set all the arguments properly, you can simply cd to RMI/script and run

./train.sh

to start training.

  • Monitoring the training process through tensorboard
tensorboard --logdir=your_logdir --port=your_port

img_ten

  • GPU memory usage

Training a DeepLabv3 model with output_stride=16, crop_size=513, and batch_size=16 needs 4 GTX 1080 GPUs (8GB) or 2 GTX TITAN X GPUs (12 GB) or 1 TITAN RTX GPUs (24 GB).

Evaluation and Inference

See script/eval.sh and script/inference.sh for detailed information.

You should also specify some variables in the scripts.

  • data_dir, where you save your dataset.

  • resume, where your checkpoints locate.

  • output_dir, where the output data will be saved.

Then run

./eval.sh

or

./inference.sh

Experiments

img_res01 img_res02

img_res03

Some selected qualitative results on PASCAL VOC 2012 val set. Segmentation results of DeepLabv3+&RMI have richer details than DeepLabv3+&CE, e.g., small bumps of the airplane wing, branches of plants, limbs of cows and sheep, and so on.

Citations

If our paper and code are beneficial to your work, please cite:

@inproceedings{2019_zhao_rmi,
  author    = {Shuai Zhao and
               Yang Wang and
               Zheng Yang and
               Deng Cai},
  title     = {Region Mutual Information Loss for Semantic Segmentation},
  booktitle = {NeurIPS},
  year      = {2019},
}

If other related work in our code or paper also helps you, please cite the corresponding papers.

Acknowledgements

img_cad