Long-tailed Semantic Segmentation

By Shan Li, Lu Yang, Pu Cao, Liulei Li, Huadong Ma

paper

Features

Mask2Former and DeepLabV3Plus for long-tailed semantic segmentation (ltss).
Support major ltss datasets: ADE20K-Full, COCO-Stuff-LT, MHP-v2-LT.
Support Repeat Factor Sampling (RFS), Copy-Paste, Seesaw Loss (only for Mask2Former) solutions.

Installation

See installation instructions.

Getting Started

See Preparing Datasets for LTSS.

See Getting Started with LTSS.

LTSS Datasets

Statistics of LTSS Datasets

Datasets	#Images	#Train/Val/Test/	#Classes	Gini@Image	Gini@Pixel	Download
ADE20K-Full	27,574	25,574/2,000/-	847	0.865	0.934	-
COCO-Stuff-LT	87,614	40,679/5,000/40,670	171	0.669	0.773	-
MHP-v2-LT	16,931	6,931/5,000/5,000	59	0.701	0.909	-

ADE20K-Full is an extended version of ADE20K, which is proposed in MaskFormer
COCO-Stuff-LT is sampled from COCO-Stuff-118K, and MHP-v2-LT is sampled from MHPv2-15K.

Model Zoo and Baselines

1. Baseline models (Mask2Former-R50):

Datasets	mIoU	Image-level			Pixel-level			ckpts
Datasets	mIoU	mIoU@r	mIoU@c	mIoU@f	mIoU@r	mIoU@c	mIoU@f	ckpts
ADE20K	47.2	-	-	-	-	-	-	-
ADE20K-Full	18.8	4.8	13.4	25.1	3.5	6.2	28.1	-
COCO-Stuff	46.5	-	-	-	-	-	-	-
COCO-Stuff-LT	32.6	13.9	24.5	41.4	13.5	20.7	42.7	-
MHP-v2	44.6	-	-	-	-	-	-	-
MHP-v2-LT	32.3	8.8	10.4	46.8	13.8	10.6	45.4	-

License

Shield:

The majority of LTSS is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

However portions of the project are available under separate license terms: Swin-Transformer-Semantic-Segmentation is licensed under the MIT license, Deformable-DETR is licensed under the Apache-2.0 License.

Citing LTSS

If you use LTSS in your research or wish to refer to the baseline results published in the Model Zoo, please use the following BibTeX entry.

@article{li2024frequency,
  title={Frequency-based Matcher for Long-tailed Semantic Segmentation},
  author={Li, Shan and Yang, Lu and Cao, Pu and Li, Liulei and Ma, Huadong},
  journal={IEEE Transactions on Multimedia},
  year={2024},
  publisher={IEEE}
}

If you find the code useful, please also consider the following BibTeX entry.

@inproceedings{cheng2022mask2former,
  title={Masked-attention Mask Transformer for Universal Image Segmentation},
  author={Bowen Cheng and Ishan Misra and Alexander G. Schwing and Alexander Kirillov and Rohit Girdhar},
  journal={CVPR},
  year={2022}
}

@inproceedings{cheng2021maskformer,
  title={Per-Pixel Classification is Not All You Need for Semantic Segmentation},
  author={Bowen Cheng and Alexander G. Schwing and Alexander Kirillov},
  journal={NeurIPS},
  year={2021}
}

Acknowledgement

Code is largely based on detectron2, MaskFormer, Mask2Former.

caopulan/Mask2Former-LT