mmseg-extension
is a comprehensive extension of
the MMSegmentation library (version 1.x),
designed to provide a more versatile and up-to-date framework for semantic segmentation.
This repository consolidates the latest advancements in semantic segmentation
by integrating and unifying various models and codes within the MMSegmentation ecosystem.
Users benefit from a consistent and streamlined training and testing process,
significantly reducing the learning curve and enhancing productivity.
The main branch works with PyTorch 2.0 or higher (we recommend PyTorch 2.3). You can still use PyTorch 1.x, but no testing has been conducted.
-
MMSegmentation Extension
This repository extends the capabilities of MMSegmentation 1.x, leveraging its robust framework for semantic segmentation tasks.
-
Model Migration
Models from MMSegmentation 0.x are migrated to be compatible with MMSegmentation 1.x.
-
Integration of External Codes
Codes and models not originally developed with MMSegmentation can be adapted to use MMSegmentation's data loading, training, and validation mechanisms.
-
Model Weights Compatibility
Models trained in their original repositories can be used directly for training and inference in mmseg-extension without the need for retraining.
-
Tracking Latest Models
The repository stays updated with the latest research and models in semantic segmentation.
-
Minimal Changes
The Config file names remain the same as in the original repository, making it easy for developers familiar with the original repository to get started without much hassle.
Addressing Key Issues
-
Staying Current with Latest Models
mmseg-extension addresses the delay in MMSegmentation's inclusion of the latest models by continuously integrating the newest research.
-
Standardizing Disparate Codebases
By providing a unified framework, mmseg-extension solves the problem of inconsistent data loading, training, and validation scripts across different research papers.
-
Utilizing Pre-trained Weights
Ensures compatibility with pre-trained weights from various repositories, enabling seamless model integration without the need for retraining.
-
Installation: Please refer to get_started.md for installation.
-
If you are not familiar with
mmseg v1.x
, please read:
Name | Year | Publication | Paper | Code |
---|---|---|---|---|
ViT-Adapter | 2023 | ICLR | Arxiv | Code |
ViT-CoMer | 2024 | CVPR | Arxiv | Code |
TransNeXt | 2024 | CVPR | Arxiv | Code |
UniRepLKNet | 2024 | CVPR | Arxiv | Code |
BiFormer | 2023 | CVPR | Arxiv | Code |
ConvNeXt V2 | 2023 | CVPR | Arxiv | Code |
InternImage | 2023 | CVPR | Arxiv | Code |
FlashInternImage | 2024 | CVPR | Arxiv | Code |
Name | Year | Publication | Paper | Code |
---|---|---|---|---|
Balanced Softmax Loss | 2020 | NeurIPS | Arxiv | Code |
Metrics | Year | Publication | Paper | Code | Single GPU | Multi GPU |
---|---|---|---|---|---|---|
|
2023 | NeurIPS | Paper | Code | ✓ | ✓ |
Identifier Description
Identifier | description |
---|---|
✔ | Supported |
✖ | Not supported, but may be supported in future versions |
- | Not tested |
You can find detailed information about ViT Adapters in README.md.
ViT-Adapter Pretraining Sources
ViT-Adapter ADE20K val
Method | Backbone | Pretrain | Lr schd | Crop Size | mIoU (SS/MS) | #Param | Config | Download | Support? | our mIoU (SS/MS) | our config |
---|---|---|---|---|---|---|---|---|---|---|---|
UperNet | ViT-Adapter-T | DeiT-T | 160k | 512 | 42.6 / 43.6 | 36M | config | ckpt | log | ✔ | -/- | config |
UperNet | ViT-Adapter-S | DeiT-S | 160k | 512 | 46.2 / 47.1 | 58M | config | ckpt | ✔ | 46.09/46.48 | config |
UperNet | ViT-Adapter-B | DeiT-B | 160k | 512 | 48.8 / 49.7 | 134M | config | ckpt | log | ✔ | 48.00/49.21 | config |
UperNet | ViT-Adapter-T | AugReg-T | 160k | 512 | 43.9 / 44.8 | 36M | config | ckpt | log | ✔ | -/- | config |
UperNet | ViT-Adapter-B | AugReg-B | 160k | 512 | 51.9 / 52.5 | 134M | config | ckpt | log | ✔ | -/- | config |
UperNet | ViT-Adapter-L | AugReg-L | 160k | 512 | 53.4 / 54.4 | 364M | config | ckpt | log | ✔ | -/- | config |
UperNet | ViT-Adapter-L | Uni-Perceiver-L | 160k | 512 | 55.0 / 55.4 | 364M | config | ckpt | log | ✖ | ✖ | ✖ |
UperNet | ViT-Adapter-L | BEiT-L | 160k | 640 | 58.0 / 58.4 | 451M | config | ckpt | log | ✔ | 58.08/58.16 | config |
ViT-CoMer ADE20K val
Method | Backbone | Pretrain | Lr schd | Crop Size | mIoU(SS/MS) | #Param | Config | Ckpt | Log | Support? | our mIoU (SS/MS) | our config |
---|---|---|---|---|---|---|---|---|---|---|---|---|
UperNet | ViT-CoMer-T | DeiT-T | 160k | 512 | 43.5/- | 38.7M | config | ckpt | log | ✔ | 43.66/- | config |
UperNet | ViT-CoMer-S | DeiT-S | 160k | 512 | 46.5/- | 61.4M | config | ckpt | log | ✔ | 46.09/46.23 | config |
UperNet | ViT-CoMer-B | DeiT-S | 160k | 512 | 48.8/- | 144.7M | - | - | - | ✔ | -/- | config |
InternImage ADE20K Semantic Segmentation
backbone | method | resolution | mIoU (ss/ms) | #param | FLOPs | download | Support? | our mIoU (SS/MS) | our config |
---|---|---|---|---|---|---|---|---|---|
InternImage-T | UperNet | 512x512 | 47.9 / 48.1 | 59M | 944G | ckpt | cfg | ✔ | 47.60/- | config |
InternImage-S | UperNet | 512x512 | 50.1 / 50.9 | 80M | 1017G | ckpt | cfg | ✔ | 49.77/- | config |
InternImage-B | UperNet | 512x512 | 50.8 / 51.3 | 128M | 1185G | ckpt | cfg | ✔ | 50.46/51.05 | config |
InternImage-L | UperNet | 640x640 | 53.9 / 54.1 | 256M | 2526G | ckpt | cfg | ✔ | 53.39/- | config |
InternImage-XL | UperNet | 640x640 | 55.0 / 55.3 | 368M | 3142G | ckpt | cfg | ✔ | 54.4/- | config |
InternImage-H | UperNet | 896x896 | 59.9 / 60.3 | 1.12B | 3566G | ckpt | cfg | ✔ | 59.49/- | config |
FlashInternImage ADE20K Semantic Segmentation
backbone | method | resolution | mIoU (ss/ms) | Config | Download | Support? | our mIoU (SS/MS) | our config |
---|---|---|---|---|---|---|---|---|
FlashInternImage-T | UperNet | 512x512 | 49.3 / 50.3 | config | ckpt | log | ✔ | -/- | - |
FlashInternImage-S | UperNet | 512x512 | 50.6 / 51.6 | config | ckpt | log | ✔ | -/- | - |
FlashInternImage-B | UperNet | 512x512 | 52.0 / 52.6 | config | ckpt | log | ✔ | 51.22/- | config |
FlashInternImage-L | UperNet | 640x640 | 55.6 / 56.0 | config | ckpt | log | ✔ | -/- | - |
TransNeXt ADE20K Semantic Segmentation using the UPerNet method
Backbone | Pretrained Model | Crop Size | Lr Schd | mIoU | mIoU (ms+flip) | #Params | Download | Config | Log | Support? | our mIoU (SS/MS) | our config |
---|---|---|---|---|---|---|---|---|---|---|---|---|
TransNeXt-Tiny | ImageNet-1K | 512x512 | 160K | 51.1 | 51.5/51.7 | 59M | model | config | log | ✔ | 53.02/- | config |
TransNeXt-Small | ImageNet-1K | 512x512 | 160K | 52.2 | 52.5/52.8 | 80M | model | config | log | ✔ | 52.15/- | config |
TransNeXt-Base | ImageNet-1K | 512x512 | 160K | 53.0 | 53.5/53.7 | 121M | model | config | log | ✔ | 51.11/- | config |
- In the context of multi-scale evaluation, TransNeXt reports test results under two distinct scenarios: interpolation and extrapolation of relative position bias.
TransNeXt ADE20K Semantic Segmentation using the Mask2Former method
Backbone | Pretrained Model | Crop Size | Lr Schd | mIoU | #Params | Download | Config | Log | Support? | our mIoU (SS/MS) | our config |
---|---|---|---|---|---|---|---|---|---|---|---|
TransNeXt-Tiny | ImageNet-1K | 512x512 | 160K | 53.4 | 47.5M | model | config | log | ✔ | 53.43/- | config |
TransNeXt-Small | ImageNet-1K | 512x512 | 160K | 54.1 | 69.0M | model | config | log | ✔ | 54.06/- | config |
TransNeXt-Base | ImageNet-1K | 512x512 | 160K | 54.7 | 109M | model | config | log | ✔ | 54.68/- | config |
UniRepLKNet ADE20K Semantic Segmentation
name | resolution | mIoU (ss/ms) | #params | FLOPs | Weights | Support? | our mIoU (SS/MS) | our config |
---|---|---|---|---|---|---|---|---|
UniRepLKNet-T | 512x512 | 48.6/49.1 | 61M | 946G | ckpt | ✔ | 47.94/- | config |
UniRepLKNet-S | 512x512 | 50.5/51.0 | 86M | 1036G | ckpt | ✔ | -/- | config |
UniRepLKNet-S_22K | 512x512 | 51.9/52.7 | 86M | 1036G | ckpt | ✔ | -/- | config |
UniRepLKNet-S_22K | 640x640 | 52.3/52.7 | 86M | 1618G | ckpt | ✔ | -/- | config |
UniRepLKNet-B_22K | 640x640 | 53.5/53.9 | 130M | 1850G | ckpt | ✔ | 52.89/- | config |
UniRepLKNet-L_22K | 640x640 | 54.5/55.0 | 254M | 2507G | ckpt | ✔ | -/- | config |
UniRepLKNet-XL_22K | 640x640 | 55.2/55.6 | 425M | 3420G | ckpt | ✖ | -/- | - |
NOTE: Checkpoints have already been released on hugging face. You can download them right now from https://huggingface.co/DingXiaoH/UniRepLKNet/tree/main.
BiFormer Semantic Segmentation
NOTE: The official code library is not open source for semantic segmentation weights. This repository can load the pre training weights of the backbone network on the ImageNet-1K dataset. You can find the weight in the URL
ConvNeXt-V2 Semantic Segmentation
NOTE: The official code library is not open source for semantic segmentation weights. This repository can load the pre training weights of the backbone network on the ImageNet-1K or ImageNet-22K dataset. You can find the weight in the URL