/Mask2Former-LT

[TMM2024] Official code of "Frequency-based Matcher for Long-tailed Semantic Segmentation".

Primary LanguagePythonOtherNOASSERTION

Long-tailed Semantic Segmentation

By Shan Li, Lu Yang, Pu Cao, Liulei Li, Huadong Ma

paper

Features

  • Mask2Former and DeepLabV3Plus for long-tailed semantic segmentation (ltss).
  • Support major ltss datasets: ADE20K-Full, COCO-Stuff-LT, MHP-v2-LT.
  • Support Repeat Factor Sampling (RFS), Copy-Paste, Seesaw Loss (only for Mask2Former) solutions.

Installation

See installation instructions.

Getting Started

See Preparing Datasets for LTSS.

See Getting Started with LTSS.

LTSS Datasets

Statistics of LTSS Datasets

Datasets #Images #Train/Val/Test/ #Classes Gini@Image Gini@Pixel Download
ADE20K-Full 27,574 25,574/2,000/- 847 0.865 0.934 -
COCO-Stuff-LT 87,614 40,679/5,000/40,670 171 0.669 0.773 -
MHP-v2-LT 16,931 6,931/5,000/5,000 59 0.701 0.909 -

Model Zoo and Baselines

1. Baseline models (Mask2Former-R50):

Datasets mIoU Image-level Pixel-level ckpts
mIoU@r mIoU@c mIoU@f mIoU@r mIoU@c mIoU@f
ADE20K 47.2 - - - - - - -
ADE20K-Full 18.8 4.8 13.4 25.1 3.5 6.2 28.1 -
COCO-Stuff 46.5 - - - - - - -
COCO-Stuff-LT 32.6 13.9 24.5 41.4 13.5 20.7 42.7 -
MHP-v2 44.6 - - - - - - -
MHP-v2-LT 32.3 8.8 10.4 46.8 13.8 10.6 45.4 -

License

Shield: CC BY-NC 4.0

The majority of LTSS is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

CC BY-NC 4.0

However portions of the project are available under separate license terms: Swin-Transformer-Semantic-Segmentation is licensed under the MIT license, Deformable-DETR is licensed under the Apache-2.0 License.

Citing LTSS

If you use LTSS in your research or wish to refer to the baseline results published in the Model Zoo, please use the following BibTeX entry.

@article{li2024frequency,
  title={Frequency-based Matcher for Long-tailed Semantic Segmentation},
  author={Li, Shan and Yang, Lu and Cao, Pu and Li, Liulei and Ma, Huadong},
  journal={IEEE Transactions on Multimedia},
  year={2024},
  publisher={IEEE}
}

If you find the code useful, please also consider the following BibTeX entry.

@inproceedings{cheng2022mask2former,
  title={Masked-attention Mask Transformer for Universal Image Segmentation},
  author={Bowen Cheng and Ishan Misra and Alexander G. Schwing and Alexander Kirillov and Rohit Girdhar},
  journal={CVPR},
  year={2022}
}
@inproceedings{cheng2021maskformer,
  title={Per-Pixel Classification is Not All You Need for Semantic Segmentation},
  author={Bowen Cheng and Alexander G. Schwing and Alexander Kirillov},
  journal={NeurIPS},
  year={2021}
}

Acknowledgement

Code is largely based on detectron2, MaskFormer, Mask2Former.