/MSVMamba

Primary LanguagePython

MSVMamba

Multi-Scale VMamba: Hierarchy in Hierarchy Visual State Space Model

Paper: (arXiv:2405.14174)

Updates

  • May. 23th, 2024: We release the code, log and ckpt for MSVMamba

Introduction

MSVMamba is a visual state space model that introduces a hierarchy in hierarchy design to the VMamba model. This repository contains the code for training and evaluating MSVMamba models on the ImageNet-1K dataset for image classification, COCO dataset for object detection, and ADE20K dataset for semantic segmentation. For more information, please refer to our paper.

Main Results

Classification on ImageNet-1K

name pretrain resolution acc@1 #params FLOPs logs&ckpts
MSVMamba-Nano ImageNet-1K 224x224 77.3 7M 0.9G log&ckpt
MSVMamba-Micro ImageNet-1K 224x224 79.8 12M 1.5G log&ckpt
MSVMamba-Tiny ImageNet-1K 224x224 82.8 33M 4.6G log&ckpt

Object Detection on COCO

Backbone #params FLOPs Detector box mAP mask mAP logs&ckpts
MSVMamba-Micro 32M 201G MaskRCNN@1x 43.8 39.9 log&ckpt
MSVMamba-Tiny 53M 252G MaskRCNN@1x 46.9 42.2 log&ckpt
MSVMamba-Micro 32M 201G MaskRCNN@3x 46.3 41.8 log&ckpt
MSVMamba-Tiny 53M 252G MaskRCNN@3x 48.3 43.2 log&ckpt

Semantic Segmentation on ADE20K

Backbone Input #params FLOPs Segmentor mIoU(SS) mIoU(MS) logs&ckpts
MSVMamba-Micro 512x512 42M 875G UperNet@160k 45.1 45.4 log&ckpt
MSVMamba-Tiny 512x512 65M 942G UperNet@160k 47.8 - log&ckpt

Getting Started

The steps to create env, train and evaluate MSVMamba models are followed by the same steps as VMamba.

Installation

Step 1: Clone the MSVMamba repository:

git clone https://github.com/YuHengsss/MSVMamba.git
cd MSVMamba

Step 2: Environment Setup:

Create and activate a new conda environment

conda create -n msvmamba
conda activate msvmamba

Install Dependencies

pip install -r requirements.txt
cd kernels/selective_scan && pip install .

Dependencies for Detection and Segmentation (optional)

pip install mmengine==0.10.1 mmcv==2.1.0 opencv-python-headless ftfy regex
pip install mmdet==3.3.0 mmsegmentation==1.2.2 mmpretrain==1.2.0

Quick Start

Classification

To train MSVMamba models for classification on ImageNet, use the following commands for different configurations:

python -m torch.distributed.launch --nnodes=1 --node_rank=0 --nproc_per_node=8 --master_addr="127.0.0.1" --master_port=29501 main.py --cfg </path/to/config> --batch-size 128 --data-path </path/of/dataset> --output /tmp

If you only want to test the performance (together with params and flops):

python -m torch.distributed.launch --nnodes=1 --node_rank=0 --nproc_per_node=1 --master_addr="127.0.0.1" --master_port=29501 main.py --cfg </path/to/config> --batch-size 128 --data-path </path/of/dataset> --output /tmp --resume </path/of/checkpoint> --eval

Detection and Segmentation

To evaluate with mmdetection or mmsegmentation:

bash ./tools/dist_test.sh </path/to/config> </path/to/checkpoint> 1

use --tta to get the mIoU(ms) in segmentation

To train with mmdetection or mmsegmentation:

bash ./tools/dist_train.sh </path/to/config> 8

Citation

If MSVMamba is helpful for your research, please cite the following paper:

@article{shi2024multiscale,
      title={Multi-Scale VMamba: Hierarchy in Hierarchy Visual State Space Model}, 
      author={Yuheng Shi and Minjing Dong and Chang Xu},
      journal={arXiv preprint arXiv:2405.14174},
      year={2024}
}

Acknowledgment

This project is based on VMamba(paper, code), Mamba (paper, code), Swin-Transformer (paper, code), ConvNeXt (paper, code), OpenMMLab, thanks for their excellent works.