SSENet-pytorch

Update: The code is deprecated. Please see our latest work SEAM (https://github.com/YudeWang/SEAM)

Introduction

As well-known, conventional CAM tends to be incomplete or over-activated due to weak supervision. Fortunately, we ﬁnd that semantic segmentation has a characteristic of spatial transformation equivariance, which can form a few self-supervisions to help weakly supervised learning. This work mainly explores the advantages of scale equivariant constrains for CAM generation, formulated as a self supervised scale equivariant network (SSENet). Extensive experiments on PASCAL VOC 2012 datasets demonstrate that our method achieves outstanding performance comparing with other state-of-the-arts.

Thanks to jiwoon-ahn, the code of this repository borrow heavly from his AffinityNet project, and we follw the same pipeline to verify the effectiveness of our SSENet.

Dependency

This repo is tested on Ubuntu 16.04, with python 3.6, pytorch 0.4, torchvision 0.2.1, CUDA 9.0, 4xGPUs (NVIDIA TITAN XP 12GB)
Please install tensorboardX for training visualization.
The dataset we used is PASCAL VOC 2012, please download the VOC development kit. It is suggested to make a soft link toward downloaded dataset.

ln -s $your_dataset_path/VOCdevkit/VOC2012 $your_voc12_root

(Optional) The image-level labels have already been given in voc12/cls_label.npy. If you want to regenerate it (which is unnecessary), please download the annotation of VOC 2012 SegmentationClassAug training set (containing 10582 images), which can be download here and place them all as $your_voc12_root/SegmentationClassAug/xxxxxx.png. Then run the code

cd voc12
python make_cls_labels.py --voc12_root $your_voc12_root

(Optional) If you want to train the network by yourself, here is ImageNet pretrained model for VGG16 vgg16_20M.caffemodel and ResNet38 ilsvrc-cls_rna-a1_cls1000_ep-0001.params. Noting that our SSENet is only tested on ResNet38.

Usage

CAM generalization step

SSENet training

python train_cls_ser.py --voc12_root $your_voc12_root --weights $your_weights_file --session_name $your_session_name

SSENet inference. Noting that the the crf results will be saved in $your_crf_dir+_4.0 and $your_crf_dir+_24.0, where the parameters can be modified in infer_cls_ser.py. These two folders will be further used in following AffinityNet training step.

python infer_cls_ser.py --weights $your_SSENet_checkpoint --infer_list [voc12/val.txt | voc12/train.txt | voc12/train_aug.txt] --out_cam $your_cam_dir --out_crf $your_crf_dir --out_cam_pred $your_pred_dir

CAM step evaluation. We provide python mIoU evaluation script evaluation.py, or you can use official development kit.

python evaluation.py --list $your_voc12_root/ImageSets/Segmentation/[val.txt | train.txt] --predict_dir $your_pred_dir --gt_dir $your_voc12_root/SegmentationClass

Random walk step

The random walk step keep the same with AffinityNet project.

Train AffinityNet.

python train_aff.py --weights $your_weights_file --voc12_root $your_voc12_root --la_crf_dir $your_crf_dir_4.0 --ha_crf_dir $your_crf_dir_24.0 --session_name $your_session_name

Random walk propagation

python infer_aff.py --weights $your_weights_file --infer_list [voc12/val.txt | voc12/train.txt] --cam_dir $your_cam_dir --voc12_root $your_voc12_root --out_rw $your_rw_dir

Random walk step evaluation

python evaluation.py --list $your_voc12_root/ImageSets/Segmentation/[val.txt | train.txt] --predict_dir $your_rw_dir --gt_dir $your_voc12_root/SegmentationClass

Results

The generated pseudo labels are evaluated on PASCAL VOC 2012 train set.

Model	CAM step (mIoU)	CAM+rw step (mIoU)
ResNet38	48.0	58.1	AffinityNet cvpr submission[1]
ResNet38	47.3	58.8	reimplemented baseline
SSENet-ResNet38	49.8	62.1	branch downsampling rate = 0.3 (weights)

Citation

We encourage you to cite our latest work SEAM, which is accepted by CVPR 2020.

@InProceedings{Wang_2020_CVPR_SEAM,
    author = {Yude Wang and Jie Zhang and Meina Kan and Shiguang Shan and Xilin Chen},
    title = {Self-supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentation},
    booktitle = {Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
    year = {2020}
}

Reference

[1] J. Ahn and S. Kwak. Learning pixel-level semantic affinity with image-level supervision for weakly supervised semantic segmentation. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.

YudeWang/SSENet-pytorch