By Shan Li, Lu Yang, Pu Cao, Liulei Li, Huadong Ma
- Mask2Former and DeepLabV3Plus for long-tailed semantic segmentation (ltss).
- Support major ltss datasets: ADE20K-Full, COCO-Stuff-LT, MHP-v2-LT.
- Support Repeat Factor Sampling (RFS), Copy-Paste, Seesaw Loss (only for Mask2Former) solutions.
See installation instructions.
See Preparing Datasets for LTSS.
See Getting Started with LTSS.
Datasets | #Images | #Train/Val/Test/ | #Classes | Gini@Image | Gini@Pixel | Download |
---|---|---|---|---|---|---|
ADE20K-Full | 27,574 | 25,574/2,000/- | 847 | 0.865 | 0.934 | - |
COCO-Stuff-LT | 87,614 | 40,679/5,000/40,670 | 171 | 0.669 | 0.773 | - |
MHP-v2-LT | 16,931 | 6,931/5,000/5,000 | 59 | 0.701 | 0.909 | - |
- ADE20K-Full is an extended version of ADE20K, which is proposed in MaskFormer
- COCO-Stuff-LT is sampled from COCO-Stuff-118K, and MHP-v2-LT is sampled from MHPv2-15K.
Datasets | mIoU | Image-level | Pixel-level | ckpts | ||||
---|---|---|---|---|---|---|---|---|
mIoU@r | mIoU@c | mIoU@f | mIoU@r | mIoU@c | mIoU@f | |||
ADE20K | 47.2 | - | - | - | - | - | - | - |
ADE20K-Full | 18.8 | 4.8 | 13.4 | 25.1 | 3.5 | 6.2 | 28.1 | - |
COCO-Stuff | 46.5 | - | - | - | - | - | - | - |
COCO-Stuff-LT | 32.6 | 13.9 | 24.5 | 41.4 | 13.5 | 20.7 | 42.7 | - |
MHP-v2 | 44.6 | - | - | - | - | - | - | - |
MHP-v2-LT | 32.3 | 8.8 | 10.4 | 46.8 | 13.8 | 10.6 | 45.4 | - |
The majority of LTSS is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
However portions of the project are available under separate license terms: Swin-Transformer-Semantic-Segmentation is licensed under the MIT license, Deformable-DETR is licensed under the Apache-2.0 License.
If you use LTSS in your research or wish to refer to the baseline results published in the Model Zoo, please use the following BibTeX entry.
@article{li2024frequency,
title={Frequency-based Matcher for Long-tailed Semantic Segmentation},
author={Li, Shan and Yang, Lu and Cao, Pu and Li, Liulei and Ma, Huadong},
journal={IEEE Transactions on Multimedia},
year={2024},
publisher={IEEE}
}
If you find the code useful, please also consider the following BibTeX entry.
@inproceedings{cheng2022mask2former,
title={Masked-attention Mask Transformer for Universal Image Segmentation},
author={Bowen Cheng and Ishan Misra and Alexander G. Schwing and Alexander Kirillov and Rohit Girdhar},
journal={CVPR},
year={2022}
}
@inproceedings{cheng2021maskformer,
title={Per-Pixel Classification is Not All You Need for Semantic Segmentation},
author={Bowen Cheng and Alexander G. Schwing and Alexander Kirillov},
journal={NeurIPS},
year={2021}
}
Code is largely based on detectron2, MaskFormer, Mask2Former.