mmseg-extension

Introduction

mmseg-extension is a comprehensive extension of the MMSegmentation library (version 1.x), designed to provide a more versatile and up-to-date framework for semantic segmentation. This repository consolidates the latest advancements in semantic segmentation by integrating and unifying various models and codes within the MMSegmentation ecosystem. Users benefit from a consistent and streamlined training and testing process, significantly reducing the learning curve and enhancing productivity.

The main branch works with PyTorch 2.0 or higher (we recommend PyTorch 2.3). You can still use PyTorch 1.x, but no testing has been conducted.

Features and Objectives

MMSegmentation Extension

This repository extends the capabilities of MMSegmentation 1.x, leveraging its robust framework for semantic segmentation tasks.
Model Migration

Models from MMSegmentation 0.x are migrated to be compatible with MMSegmentation 1.x.
Integration of External Codes

Codes and models not originally developed with MMSegmentation can be adapted to use MMSegmentation's data loading, training, and validation mechanisms.
Model Weights Compatibility

Models trained in their original repositories can be used directly for training and inference in mmseg-extension without the need for retraining.
Tracking Latest Models

The repository stays updated with the latest research and models in semantic segmentation.
Minimal Changes

The Config file names remain the same as in the original repository, making it easy for developers familiar with the original repository to get started without much hassle.

Addressing Key Issues

Staying Current with Latest Models

mmseg-extension addresses the delay in MMSegmentation's inclusion of the latest models by continuously integrating the newest research.
Standardizing Disparate Codebases

By providing a unified framework, mmseg-extension solves the problem of inconsistent data loading, training, and validation scripts across different research papers.
Utilizing Pre-trained Weights

Ensures compatibility with pre-trained weights from various repositories, enabling seamless model integration without the need for retraining.

Installation and Usage

Installation: Please refer to get_started.md for installation.
Usage: Train and test with existing models
If you are not familiar with mmseg v1.x, please read:
- Getting started with MMEngine
- Overview of MMSegmentation

Overview of Model Zoo

Name	Year	Publication	Paper	Code
ViT-Adapter	2023	ICLR	Arxiv	Code
ViT-CoMer	2024	CVPR	Arxiv	Code
TransNeXt	2024	CVPR	Arxiv	Code
UniRepLKNet	2024	CVPR	Arxiv	Code
BiFormer	2023	CVPR	Arxiv	Code
ConvNeXt V2	2023	CVPR	Arxiv	Code
InternImage	2023	CVPR	Arxiv	Code
FlashInternImage	2024	CVPR	Arxiv	Code

Loss Function

Name	Year	Publication	Paper	Code
Balanced Softmax Loss	2020	NeurIPS	Arxiv	Code

Metric

Metrics	Year	Publication	Paper	Code	Single GPU	Multi GPU
$\text{Acc}$, $\text{mAcc}^\text{D, I, C}$, $\text{mIoU}^\text{D, I, C}$, $\text{mDice}^\text{D, I, C}$, Worst-case metrics	2023	NeurIPS	Paper	Code	✓	✓

Completed Work Results

Identifier Description

Identifier	description
✔	Supported
✖	Not supported, but may be supported in future versions
-	Not tested

ViT-Adapter

You can find detailed information about ViT Adapters in README.md.

ViT-Adapter Pretraining Sources

Name	Year	Type	Data	Repo	Paper	Support?
DeiT	2021	Supervised	ImageNet-1K	repo	paper	✔
AugReg	2021	Supervised	ImageNet-22K	repo	paper	-
BEiT	2021	MIM	ImageNet-22K	repo	paper	-
Uni-Perceiver	2022	Supervised	Multi-Modal	repo	paper	✖
BEiTv2	2022	MIM	ImageNet-22K	repo	paper	-

ViT-Adapter ADE20K val

Method	Backbone	Pretrain	Lr schd	Crop Size	mIoU (SS/MS)	#Param	Config	Download	Support?	our mIoU (SS/MS)	our config
UperNet	ViT-Adapter-T	DeiT-T	160k	512	42.6 / 43.6	36M	config	ckpt \| log	✔	-/-	config
UperNet	ViT-Adapter-S	DeiT-S	160k	512	46.2 / 47.1	58M	config	ckpt	✔	46.09/46.48	config
UperNet	ViT-Adapter-B	DeiT-B	160k	512	48.8 / 49.7	134M	config	ckpt \| log	✔	48.00/49.21	config
UperNet	ViT-Adapter-T	AugReg-T	160k	512	43.9 / 44.8	36M	config	ckpt \| log	✔	-/-	config
UperNet	ViT-Adapter-B	AugReg-B	160k	512	51.9 / 52.5	134M	config	ckpt \| log	✔	-/-	config
UperNet	ViT-Adapter-L	AugReg-L	160k	512	53.4 / 54.4	364M	config	ckpt \| log	✔	-/-	config
UperNet	ViT-Adapter-L	Uni-Perceiver-L	160k	512	55.0 / 55.4	364M	config	ckpt \| log	✖	✖	✖
UperNet	ViT-Adapter-L	BEiT-L	160k	640	58.0 / 58.4	451M	config	ckpt \| log	✔	58.08/58.16	config

ViT-CoMer

ViT-CoMer ADE20K val

Method	Backbone	Pretrain	Lr schd	Crop Size	mIoU(SS/MS)	#Param	Config	Ckpt	Log	Support?	our mIoU (SS/MS)	our config
UperNet	ViT-CoMer-T	DeiT-T	160k	512	43.5/-	38.7M	config	ckpt	log	✔	43.66/-	config
UperNet	ViT-CoMer-S	DeiT-S	160k	512	46.5/-	61.4M	config	ckpt	log	✔	46.09/46.23	config
UperNet	ViT-CoMer-B	DeiT-S	160k	512	48.8/-	144.7M	-	-	-	✔	-/-	config

InternImage

InternImage ADE20K Semantic Segmentation

backbone	method	resolution	mIoU (ss/ms)	#param	FLOPs	download	Support?	our mIoU (SS/MS)	our config
InternImage-T	UperNet	512x512	47.9 / 48.1	59M	944G	ckpt \| cfg	✔	47.60/-	config
InternImage-S	UperNet	512x512	50.1 / 50.9	80M	1017G	ckpt \| cfg	✔	49.77/-	config
InternImage-B	UperNet	512x512	50.8 / 51.3	128M	1185G	ckpt \| cfg	✔	50.46/51.05	config
InternImage-L	UperNet	640x640	53.9 / 54.1	256M	2526G	ckpt \| cfg	✔	53.39/-	config
InternImage-XL	UperNet	640x640	55.0 / 55.3	368M	3142G	ckpt \| cfg	✔	54.4/-	config
InternImage-H	UperNet	896x896	59.9 / 60.3	1.12B	3566G	ckpt \| cfg	✔	59.49/-	config

FlashInternImage

FlashInternImage ADE20K Semantic Segmentation

backbone	method	resolution	mIoU (ss/ms)	Config	Download	Support?	our mIoU (SS/MS)	our config
FlashInternImage-T	UperNet	512x512	49.3 / 50.3	config	ckpt \| log	✔	-/-	-
FlashInternImage-S	UperNet	512x512	50.6 / 51.6	config	ckpt \| log	✔	-/-	-
FlashInternImage-B	UperNet	512x512	52.0 / 52.6	config	ckpt \| log	✔	51.22/-	config
FlashInternImage-L	UperNet	640x640	55.6 / 56.0	config	ckpt \| log	✔	-/-	-

TransNeXt

TransNeXt ADE20K Semantic Segmentation using the UPerNet method

Backbone	Pretrained Model	Crop Size	Lr Schd	mIoU	mIoU (ms+flip)	#Params	Download	Config	Log	Support?	our mIoU (SS/MS)	our config
TransNeXt-Tiny	ImageNet-1K	512x512	160K	51.1	51.5/51.7	59M	model	config	log	✔	53.02/-	config
TransNeXt-Small	ImageNet-1K	512x512	160K	52.2	52.5/52.8	80M	model	config	log	✔	52.15/-	config
TransNeXt-Base	ImageNet-1K	512x512	160K	53.0	53.5/53.7	121M	model	config	log	✔	51.11/-	config

In the context of multi-scale evaluation, TransNeXt reports test results under two distinct scenarios: interpolation and extrapolation of relative position bias.

TransNeXt ADE20K Semantic Segmentation using the Mask2Former method

Backbone	Pretrained Model	Crop Size	Lr Schd	mIoU	#Params	Download	Config	Log	Support?	our mIoU (SS/MS)	our config
TransNeXt-Tiny	ImageNet-1K	512x512	160K	53.4	47.5M	model	config	log	✔	53.43/-	config
TransNeXt-Small	ImageNet-1K	512x512	160K	54.1	69.0M	model	config	log	✔	54.06/-	config
TransNeXt-Base	ImageNet-1K	512x512	160K	54.7	109M	model	config	log	✔	54.68/-	config

UniRepLKNet

UniRepLKNet ADE20K Semantic Segmentation

name	resolution	mIoU (ss/ms)	#params	FLOPs	Weights	Support?	our mIoU (SS/MS)	our config
UniRepLKNet-T	512x512	48.6/49.1	61M	946G	ckpt	✔	47.94/-	config
UniRepLKNet-S	512x512	50.5/51.0	86M	1036G	ckpt	✔	-/-	config
UniRepLKNet-S_22K	512x512	51.9/52.7	86M	1036G	ckpt	✔	-/-	config
UniRepLKNet-S_22K	640x640	52.3/52.7	86M	1618G	ckpt	✔	-/-	config
UniRepLKNet-B_22K	640x640	53.5/53.9	130M	1850G	ckpt	✔	52.89/-	config
UniRepLKNet-L_22K	640x640	54.5/55.0	254M	2507G	ckpt	✔	-/-	config
UniRepLKNet-XL_22K	640x640	55.2/55.6	425M	3420G	ckpt	✖	-/-	-

NOTE: Checkpoints have already been released on hugging face. You can download them right now from https://huggingface.co/DingXiaoH/UniRepLKNet/tree/main.

BiFormer

BiFormer Semantic Segmentation

NOTE: The official code library is not open source for semantic segmentation weights. This repository can load the pre training weights of the backbone network on the ImageNet-1K dataset. You can find the weight in the URL

ConvNeXt V2

ConvNeXt-V2 Semantic Segmentation

chenller/mmseg-extension