/SSA-Seg

Official Pytorch Implementation of SSA-Seg (NeurIPS2024)

Primary LanguagePython

SSA-Seg

Semantic and Spatial Adaptive Pixel-level Classifier for Semantic Segmentation

Xiaowen Ma1,2, Zhenliang Ni1, Xinghao Chen1

1 Huawei Noah’s Ark Lab, 2 Zhejiang University

[Paper Link]

🔥 News

  • 2024/09/30: We fix some bugs in the code. Currently, this repository supports several baselines and the corresponding +SSA-Seg versions as follows.
Backbone HRNet (CVPR'19) MiT (NeurIPS'21) Swin (ICCV'21) AFFormer (AAAI'23) SeaFormer (ICLR'23) MSCAN (NeurIPS'22) EfficientFormerV2 (ICCV'23)
Head OCRNet (ECCV'20) SegFormer (NeurIPS'21) UperNet (ECCV'18) Afformer (AAAI'23) SeaFormer (ICLR'23) SegNext (NeurIPS'22) CGRSeg (ECCV'24)
  • 2024/09/26: SSA-Seg is accepted by NeurIPS2024!

📷 Introduction

SSA-Seg is an effecient and powerful pixel-level classifier, which significantly improves the segmentation performance of various baselines with a negligible increase in computational cost. It has three key parts: semantic prototype adaptation (SEPA), spatial prototype adaptation (SPPA), and online multi-domain distillation.

🏆 Performance

1️⃣ ADE20K

Iters: 160000 Input size: 512x512 Batch size: 16

  • General models

    +SSA-Seg Backbone Latency (ms) Params(M) Flops (G) mIoU (ss)
    OCRNet HRNet-W48 69.3 8.7 165.0 47.67
    UperNet Swin-T 54.3 61.1 236.3 47.56
    SegFormer MiT-B5 70.1 82.3 52.6 50.74
    UperNet Swin-L 107.3 234.9 405.2 52.69
    ViT-Adapter ViT-Adapter-L 284.9 364.9 616.3 55.39
  • Light weight models

    +SSA-Seg Backbone Latency (ms) Params (M) Flops (G) mIoU (ss)
    AFFormer-B AFFormer-B 26.0 3.3 4.4 42.74
    SeaFormer-B SeaFormer-B 27.3 8.8 1.8 42.46
    SegNext-T MSCAN-T 23.3 4.6 6.3 43.90
    SeaFormer-L SeaFormer-L 29.9 14.2 6.4 45.36
    CGRSeg-B EfficientFormerV2-S2 36.0 19.3 7.6 47.10
    CGRSeg-L EfficientFormerV2-L 42.6 35.8 14.8 49.00

2️⃣ COCO-Stuff-10K

Iters: 80000 Input size: 512x512 Batch size: 16

  • General models

    +SSA-Seg Backbone Latency (ms) Params (M) Flops (G) mIoU (ss)
    OCRNet HRNet-W48 69.3 8.7 165.0 37.94
    UperNet Swin-T 54.3 61.1 236.3 42.30
    SegFormer MiT-B5 70.1 82.3 52.6 45.55
    UperNet Swin-L 107.3 234.9 405.2 48.94
    ViT-Adapter ViT-Adapter-L 284.9 364.9 616.3 51.2
  • Light weight models

    +SSA-Seg Backbone Latency (ms) Params (M) Flops (G) mIoU (ss)
    AFFormer-B AFFormer-B 26.0 3.3 4.4 36.40
    SeaFormer-B SeaFormer-B 27.3 8.8 1.8 35.92
    SegNext-T MSCAN-T 23.3 4.6 6.3 38.91
    SeaFormer-L SeaFormer-L 29.9 14.2 6.4 38.48

3️⃣ PASCAL-Context

Iters: 80000 Input size: 480x480 Batch size: 16

  • General models

    +SSA-Seg Backbone Latency (ms) Params (M) Flops (G) mIoU (ss)
    OCRNet HRNet-W48 69.3 8.7 143.3 50.21
    UperNet Swin-T 54.3 61.1 207.7 55.11
    SegFormer MiT-B5 70.1 82.3 45.8 59.14
    UperNet Swin-L 107.3 234.9 363.2 61.83
    ViT-Adapter ViT-Adapter-L 284.9 364.9 616.3 66.05
  • Light weight models

    +SSA-Seg Backbone Latency (ms) Params (M) Flops (G) mIoU (ss)
    AFFormer-B AFFormer-B 26.0 3.3 4.4 49.72
    SeaFormer-B SeaFormer-B 27.3 8.8 1.8 47.00
    SegNext-T MSCAN-T 23.3 4.6 6.3 52.58
    SeaFormer-L SeaFormer-L 29.9 14.2 6.4 49.66

📚 Use example

  • Environment

    conda create --name ssa python=3.8 -y
    conda activate ssa
    pip install torch==1.8.2+cu102 torchvision==0.9.2+cu102 torchaudio==0.8.2
    pip install timm==0.6.13
    pip install mmcv-full==1.7.0
    pip install opencv-python==4.1.2.30
    pip install "mmsegmentation==0.30.0"

    SSA-Seg is built based on mmsegmentation-0.30.0, which can be referenced for data preparation.

  • Train

    # Single-gpu training
    python train.py configs/swin/upernet_swin_tiny_ade20k_ssa.py
    
    # Multi-gpu (4-gpu) training
    bash dist_train.sh configs/swin/upernet_swin_tiny_ade20k_ssa.py 4
  • Test

    # Single-gpu testing
    python test.py configs/swin/upernet_swin_tiny_ade20k_ssa.py ${CHECKPOINT_FILE} --eval mIoU
    
    # Multi-gpu (4-gpu) testing
    bash dist_test.sh configs/swin/upernet_swin_tiny_ade20k_ssa.py ${CHECKPOINT_FILE} 4 --eval mIoU
  • Benchmark

    python benchmark.py configs/swin/upernet_swin_tiny_ade20k_ssa.py ${CHECKPOINT_FILE} --repeat-times 5

🌟 Citation

If you are interested in our work, please consider giving a 🌟 and citing our work below.

@inproceedings{
ssaseg,
title={{SSA}-Seg: Semantic and Spatial Adaptive Pixel-level Classifier for Semantic Segmentation},
author={Xiaowen Ma and Zhen-Liang Ni and Xinghao Chen},
booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems},
year={2024},
url={https://openreview.net/forum?id=RZZo23pQFL}
}

💡Acknowledgment

Thanks to previous open-sourced repo: SeaFormer CAC AFFormer SegNeXt mmsegmentation CGRSeg ViT-Adapter