/AGMM-SASS

[CVPR 2023] Sparsely Annotated Semantic Segmentation with Adaptive Gaussian Mixtures

Primary LanguagePython

AGMM-SASS

Code for CVPR2023 paper, "Sparsely Annotated Semantic Segmentation with Adaptive Gaussian Mixtures"

Authors: Linshan Wu, Zhun Zhong, Leyuan Fang, Xingxin He, Qiang Liu, Jiayi Ma, and Hao Chen

Abstract

Sparsely annotated semantic segmentation (SASS) aims to learn a segmentation model by images with sparse labels (i.e., points or scribbles). Existing methods mainly focus on introducing low-level affinity or generating pseudo labels to strengthen supervision, while largely ignoring the inherent relation between labeled and unlabeled pixels. In this paper, we observe that pixels that are close to each other in the feature space are more likely to share the same class. Inspired by this, we propose a novel SASS framework, which is equipped with an Adaptive Gaussian Mixture Model (AGMM). Our AGMM can effectively endow reliable supervision for unlabeled pixels based on the distributions of labeled and unlabeled pixels. Specifically, we first build Gaussian mixtures using labeled pixels and their relatively similar unlabeled pixels, where the labeled pixels act as centroids, for modeling the feature distribution of each class. Then, we leverage the reliable information from labeled pixels and adaptively generated GMM predictions to supervise the training of unlabeled pixels, achieving online, dynamic, and robust self-supervision. In addition, by capturing category-wise Gaussian mixtures, AGMM encourages the model to learn discriminative class decision boundaries in an end-to-end contrastive learning manner. Experimental results conducted on the PASCAL VOC 2012 and Cityscapes datasets demonstrate that our AGMM can establish new state-of-the-art SASS performance.

Getting Started

Prepare Dataset

Download weak labels

├── [Your Pascal Path]
    ├── JPEGImages
    ├── point
    ├── scribble
    └── SegmentationClass
    
├── [Your Cityscapes Path]
    ├── leftImg8bit
    ├── 20clicks
    ├── 50clicks
    ├── 100clicks
    └── gtFine

Pretrained Backbone:

ResNet-50 | ResNet-101

├── ./pretrained
    ├── resnet50.pth
    └── resnet101.pth

Usage

train

sh scripts/train_voc.sh <num_gpu> <port>
sh scripts/train_city.sh <num_gpu> <port>

eval

python eval.py

Citation ✏️ 📄

If you find this repo useful for your research, please consider citing the paper as follows:

@article{wu2024modeling,
  title={Modeling the Label Distributions for Weakly-Supervised Semantic Segmentation},
  author={Wu, Linshan and Zhong, Zhun and Ma, Jiayi and Wei, Yunchao and Chen, Hao and Fang, Leyuan and Li, Shutao},
  journal={arXiv preprint arXiv:2403.13225},
  year={2024}
}
@inproceedings{AGMM,
  title={Sparsely Annotated Semantic Segmentation with Adaptive Gaussian Mixtures},
  author={Wu, Linshan and Zhong, Zhun and Fang, Leyuan and He, Xingxin and Liu, Qiang and Ma, Jiayi and Chen, Hao},
  booktitle={IEEE Conf. Comput. Vis. Pattern Recog.},
  month={June},
  year={2023},
  pages={15454-15464}
  }
@ARTICLE{Wu_DBFNet,
  author={Wu, Linshan and Fang, Leyuan and Yue, Jun and Zhang, Bob and Ghamisi, Pedram and He, Min},
  journal={IEEE Transactions on Image Processing}, 
  title={Deep Bilateral Filtering Network for Point-Supervised Semantic Segmentation in Remote Sensing Images}, 
  year={2022},
  volume={31},
  number={},
  pages={7419-7434},
  doi={10.1109/TIP.2022.3222904}}