/RETAB-Weak-Shot-Semantic-Segmentation

Official Implementation for Weak-shot Semantic Segmentation by Transferring Semantic Affinity and Boundary (BMVC 2022)

Primary LanguagePython

RETAB: Weak-shot Semantic Segmentation by Transferring Semantic Affinity and Boundary

Official PyTorch Implementation for RETAB (Region Expansion by Transferring semantic Affinity and Boundary).

Weak-shot Semantic Segmentation by Transferring Semantic Affinity and Boundary [arXiv]

Siyuan Zhou, Li Niu*, Jianlou Si, Chen Qian, Liqing Zhang
Accepted by BMVC2022.

Introduction

In this paper, we show that existing fully-annotated base categories can help segment objects of novel categories with only image-level labels, even if base categories and novel categories have no overlap. We refer to this task as weak-shot semantic segmentation, which could also be treated as WSSS with auxiliary fully-annotated categories. Based on the observation that semantic affinity and boundary are classagnostic, we propose a method called RETAB under the WSSS framework to transfer semantic affinity and boundary from base to novel categories. As a result, we find that pixel-level annotation of base categories can facilitate affinity learning and propagation, leading to higher-quality CAMs of novel categories.

RETAB

This repository takes the initial response (CAM) in PSA as an example to illustrate the usage of our RETAB model. RETAB can be applied to any type of initial response. Since the usage of other initial responses are similar to CAM, we omit them here.

Model Zoo

Fold Backbone Train all-/base-/novel-mIoU of CAM Train all-/base-/novel-mIoU of CAM+RETAB Weights of RETAB
0 ResNet-38 48.0/51.4/37.4 71.2/74.0/62.5 psa_ourbest_fold0_affnet.pth
1 ResNet-38 48.0/47.8/48.8 71.3/71.2/71.6 psa_ourbest_fold1_affnet.pth
2 ResNet-38 48.0/47.2/50.7 70.9/70.2/73.3 psa_ourbest_fold2_affnet.pth
3 ResNet-38 48.0/47.6/49.4 70.1/72.4/62.8 psa_ourbest_fold3_affnet.pth

We plan to include more models in the future.

Usage

We provide instructions on how to install dependencies via conda. First, clone the repository locally:

git clone https://github.com/bcmi/RETAB-Weak-Shot-Semantic-Segmentation.git

Then, create a virtual environment with PyTorch 1.8.1 (require CUDA >= 11.1):

conda env create -f environment.yaml
conda activate retab

Data preparation

Download PASCAL VOC 2012 development kit and extra annotations from SBD. We expect the directory structure of the dataset (denoted by ${VOC12HOME}) to be:

<VOC12HOME>
  Annotations/
  ImageSets/
  JPEGImages/
  SegmentationClass/
  SegmentationClassAug/

Then, make some preprocessing:

cp VOC2012_supp/* ${VOC12HOME}/ImageSets/SegmentationAug/
cd psa && ln -s ${VOC12HOME} VOC2012 && cd ..
cd RETAB && ln -s ${VOC12HOME} VOC2012 && cd ..

Following the category split rule in PASCAL-5i, which is commonly used in few-shot segmentation, we evenly divide the 20 foreground categories into four folds (Fold 0,1,2,3). Categories in each fold are regarded as 5 novel categories, and the remaining categories (including background) are regarded as 16 base categories. We further divide 10582 training samples into base samples and novel samples for each fold. The list of base samples and novel samples can be found at RETAB/voc12/trainaug_fold*_base.txt and RETAB/voc12/trainaug_fold*_novel.txt, respectively.

Use PSA to generate & evaluate initial CAM

Download AffinityNet weights (ResNet-38), and put it under psa/best/ to form psa/best/res38_cls.pth. Then, run:

cd psa && sh run_psa.sh && cd ..

You could find more details in psa/run_psa.sh.

Train & infer RETAB to generate pesudo labels, and evaluate them

Download Mxnet and ResNet-38 pretrained weights, and put it under RETAB/pretrained_model/ to form RETAB/pretrained_model/ilsvrc-cls_rna-a1_cls1000_ep-0001.params. Then, make some preparation:

cd RETAB
mkdir psa_initcam && cd psa_initcam && ln -s ../../psa/result/psa_trainaug_cam psa_trainaug_cam && cd ..
mkdir psa_afflabel && cd psa_afflabel && ln -s ../../psa/result/psa_trainaug_crf_4.0 psa_trainaug_crf_4.0 && ln -s ../../psa/result/psa_trainaug_crf_32.0 psa_trainaug_crf_32.0 && cd ..

and execute:

sh run_retab.sh

You could find more details in RETAB/run_retab.sh. The default setting is Fold 0. If you want to try other folds, please replace the fouth line of RETAB/run_retab.sh with FOLD=1/FOLD=2/FOLD=3.

Perform mixed-supervised segmentation

Use the ground truth labels of base samples and the generated pesudo labels of novel samples to train a segmentation network in a mixed-supervised manner. In our implementation, we adopt ResNet-38 as our final segmentation network.

Acknowledgements

Some of the evaluation codes in this repo are borrowed and modified from PSA and SEAM. Thanks them for their great work.