/mmseg-extension

mmsegmentation extension library containing the latest paper code.

Primary LanguagePythonApache License 2.0Apache-2.0

mmseg-extension

English | 简体中文

Introduction

mmseg-extension is a comprehensive extension of the MMSegmentation library (version 1.x), designed to provide a more versatile and up-to-date framework for semantic segmentation. This repository consolidates the latest advancements in semantic segmentation by integrating and unifying various models and codes within the MMSegmentation ecosystem. Users benefit from a consistent and streamlined training and testing process, significantly reducing the learning curve and enhancing productivity.

The main branch works with PyTorch 2.0 or higher (we recommend PyTorch 2.3). You can still use PyTorch 1.x, but no testing has been conducted.

Features and Objectives

  • MMSegmentation Extension

    This repository extends the capabilities of MMSegmentation 1.x, leveraging its robust framework for semantic segmentation tasks.

  • Model Migration

    Models from MMSegmentation 0.x are migrated to be compatible with MMSegmentation 1.x.

  • Integration of External Codes

    Codes and models not originally developed with MMSegmentation can be adapted to use MMSegmentation's data loading, training, and validation mechanisms.

  • Model Weights Compatibility

    Models trained in their original repositories can be used directly for training and inference in mmseg-extension without the need for retraining.

  • Tracking Latest Models

    The repository stays updated with the latest research and models in semantic segmentation.

  • Minimal Changes

    The Config file names remain the same as in the original repository, making it easy for developers familiar with the original repository to get started without much hassle.

Addressing Key Issues
  • Staying Current with Latest Models

    mmseg-extension addresses the delay in MMSegmentation's inclusion of the latest models by continuously integrating the newest research.

  • Standardizing Disparate Codebases

    By providing a unified framework, mmseg-extension solves the problem of inconsistent data loading, training, and validation scripts across different research papers.

  • Utilizing Pre-trained Weights

    Ensures compatibility with pre-trained weights from various repositories, enabling seamless model integration without the need for retraining.

Installation and Usage

Overview of Model Zoo

Name Year Publication Paper Code
ViT-Adapter 2023 ICLR Arxiv Code
ViT-CoMer 2024 CVPR Arxiv Code
TransNeXt 2024 CVPR Arxiv Code
UniRepLKNet 2024 CVPR Arxiv Code
BiFormer 2023 CVPR Arxiv Code
ConvNeXt V2 2023 CVPR Arxiv Code
InternImage 2023 CVPR Arxiv Code
FlashInternImage 2024 CVPR Arxiv Code

Loss Function

Name Year Publication Paper Code
Balanced Softmax Loss 2020 NeurIPS Arxiv Code

Metric

Metrics Year Publication Paper Code Single GPU Multi GPU
$\text{Acc}$, $\text{mAcc}^\text{D, I, C}$, $\text{mIoU}^\text{D, I, C}$, $\text{mDice}^\text{D, I, C}$, Worst-case metrics 2023 NeurIPS Paper Code

Completed Work Results

Identifier Description
Identifier description
Supported
Not supported, but may be supported in future versions
- Not tested

You can find detailed information about ViT Adapters in README.md.

ViT-Adapter Pretraining Sources
Name Year Type Data Repo Paper Support?
DeiT 2021 Supervised ImageNet-1K repo paper
AugReg 2021 Supervised ImageNet-22K repo paper -
BEiT 2021 MIM ImageNet-22K repo paper -
Uni-Perceiver 2022 Supervised Multi-Modal repo paper
BEiTv2 2022 MIM ImageNet-22K repo paper -
ViT-Adapter ADE20K val
Method Backbone Pretrain Lr schd Crop Size mIoU (SS/MS) #Param Config Download Support? our mIoU (SS/MS) our config
UperNet ViT-Adapter-T DeiT-T 160k 512 42.6 / 43.6 36M config ckpt | log -/- config
UperNet ViT-Adapter-S DeiT-S 160k 512 46.2 / 47.1 58M config ckpt 46.09/46.48 config
UperNet ViT-Adapter-B DeiT-B 160k 512 48.8 / 49.7 134M config ckpt | log 48.00/49.21 config
UperNet ViT-Adapter-T AugReg-T 160k 512 43.9 / 44.8 36M config ckpt | log -/- config
UperNet ViT-Adapter-B AugReg-B 160k 512 51.9 / 52.5 134M config ckpt | log -/- config
UperNet ViT-Adapter-L AugReg-L 160k 512 53.4 / 54.4 364M config ckpt | log -/- config
UperNet ViT-Adapter-L Uni-Perceiver-L 160k 512 55.0 / 55.4 364M config ckpt | log
UperNet ViT-Adapter-L BEiT-L 160k 640 58.0 / 58.4 451M config ckpt | log 58.08/58.16 config
ViT-CoMer ADE20K val
Method Backbone Pretrain Lr schd Crop Size mIoU(SS/MS) #Param Config Ckpt Log Support? our mIoU (SS/MS) our config
UperNet ViT-CoMer-T DeiT-T 160k 512 43.5/- 38.7M config ckpt log 43.66/- config
UperNet ViT-CoMer-S DeiT-S 160k 512 46.5/- 61.4M config ckpt log 46.09/46.23 config
UperNet ViT-CoMer-B DeiT-S 160k 512 48.8/- 144.7M - - - -/- config
InternImage ADE20K Semantic Segmentation
backbone method resolution mIoU (ss/ms) #param FLOPs download Support? our mIoU (SS/MS) our config
InternImage-T UperNet 512x512 47.9 / 48.1 59M 944G ckpt | cfg 47.60/- config
InternImage-S UperNet 512x512 50.1 / 50.9 80M 1017G ckpt | cfg 49.77/- config
InternImage-B UperNet 512x512 50.8 / 51.3 128M 1185G ckpt | cfg 50.46/51.05 config
InternImage-L UperNet 640x640 53.9 / 54.1 256M 2526G ckpt | cfg 53.39/- config
InternImage-XL UperNet 640x640 55.0 / 55.3 368M 3142G ckpt | cfg 54.4/- config
InternImage-H UperNet 896x896 59.9 / 60.3 1.12B 3566G ckpt | cfg 59.49/- config
FlashInternImage ADE20K Semantic Segmentation
backbone method resolution mIoU (ss/ms) Config Download Support? our mIoU (SS/MS) our config
FlashInternImage-T UperNet 512x512 49.3 / 50.3 config ckpt | log -/- -
FlashInternImage-S UperNet 512x512 50.6 / 51.6 config ckpt | log -/- -
FlashInternImage-B UperNet 512x512 52.0 / 52.6 config ckpt | log 51.22/- config
FlashInternImage-L UperNet 640x640 55.6 / 56.0 config ckpt | log -/- -
TransNeXt ADE20K Semantic Segmentation using the UPerNet method
Backbone Pretrained Model Crop Size Lr Schd mIoU mIoU (ms+flip) #Params Download Config Log Support? our mIoU (SS/MS) our config
TransNeXt-Tiny ImageNet-1K 512x512 160K 51.1 51.5/51.7 59M model config log 53.02/- config
TransNeXt-Small ImageNet-1K 512x512 160K 52.2 52.5/52.8 80M model config log 52.15/- config
TransNeXt-Base ImageNet-1K 512x512 160K 53.0 53.5/53.7 121M model config log 51.11/- config
  • In the context of multi-scale evaluation, TransNeXt reports test results under two distinct scenarios: interpolation and extrapolation of relative position bias.
TransNeXt ADE20K Semantic Segmentation using the Mask2Former method
Backbone Pretrained Model Crop Size Lr Schd mIoU #Params Download Config Log Support? our mIoU (SS/MS) our config
TransNeXt-Tiny ImageNet-1K 512x512 160K 53.4 47.5M model config log 53.43/- config
TransNeXt-Small ImageNet-1K 512x512 160K 54.1 69.0M model config log 54.06/- config
TransNeXt-Base ImageNet-1K 512x512 160K 54.7 109M model config log 54.68/- config
UniRepLKNet ADE20K Semantic Segmentation
name resolution mIoU (ss/ms) #params FLOPs Weights Support? our mIoU (SS/MS) our config
UniRepLKNet-T 512x512 48.6/49.1 61M 946G ckpt 47.94/- config
UniRepLKNet-S 512x512 50.5/51.0 86M 1036G ckpt -/- config
UniRepLKNet-S_22K 512x512 51.9/52.7 86M 1036G ckpt -/- config
UniRepLKNet-S_22K 640x640 52.3/52.7 86M 1618G ckpt -/- config
UniRepLKNet-B_22K 640x640 53.5/53.9 130M 1850G ckpt 52.89/- config
UniRepLKNet-L_22K 640x640 54.5/55.0 254M 2507G ckpt -/- config
UniRepLKNet-XL_22K 640x640 55.2/55.6 425M 3420G ckpt -/- -

NOTE: Checkpoints have already been released on hugging face. You can download them right now from https://huggingface.co/DingXiaoH/UniRepLKNet/tree/main.

BiFormer Semantic Segmentation

NOTE: The official code library is not open source for semantic segmentation weights. This repository can load the pre training weights of the backbone network on the ImageNet-1K dataset. You can find the weight in the URL

ConvNeXt-V2 Semantic Segmentation

NOTE: The official code library is not open source for semantic segmentation weights. This repository can load the pre training weights of the backbone network on the ImageNet-1K or ImageNet-22K dataset. You can find the weight in the URL