/Awesome-Vision-Mamba-Models

[Official Repo] A Survey on Vision Mamba: Models, Applications and Challenges

Awesome-Vision-Mamba-Models

Awesome License: MIT GitHub last commit GitHub issues Arxiv Page

[NEWS.2024/04/29] Our paper is released!

[NEWS.2024/05/02] 🎉🎉🎉Congratulations to Vision Mamba on being accepted in ICML 2024.

📢NOTE: If you have any questions, please don't hesitate to contact us at any of the following emails: rui.xu@whu.edu.cn, syangcw@connect.ust.hk, ywangrm@connect.ust.hk.

Mamba, a novel state space model, has gained recognition across diverse domains for its exceptional performance and efficient computational complexity. By addressing the limitations inherent in traditional visual foundation architectures, Mamba emerges as a promising contender poised to catalyze advancements in the field of computer vision.

⭐ This repository hosts a curated collection of literature associated with Mamba models in computer vision. Feel free to star and fork. For further details, refer to the following paper:

A Survey on Vision Mamba: Models, Applications and Challenges
Rui Xu, Shu Yang, Yihui Wang, Bo Du, Hao Chen
SMART Lab, The Hong Kong University of Science and Technology

If you find this repository is useful for you, please cite our paper:

@misc{2024vision_mamba,
      title={A Survey on Vision Mamba: Models, Applications and Challenges}, 
      author={Rui Xu and Shu Yang and Yihui Wang and Bo Du and Hao Chen},
      year={2024},
      eprint={},
      archivePrefix={arXiv 2404.18861},
      primaryClass={}
}

Contents

Backbone for Representation Learning

image

Detailed Performance Comparison

Date Paper Figure Link Code
Arxiv 24.01.17 (ICML24) Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model image Link Code
Arxiv 24.01.18 VMamba: Visual State Space Model image image Link Code
Arxiv 24.02.08 Mamba-ND: Selective State Space Modeling for Multi-Dimensional Data image Link Code
Arxiv 24.03.14 LocalMamba: Visual State Space Model with Windowed Selective Scan image Link Code
Arxiv 24.03.15 EfficientVMamba: Atrous Selective Scan for Light Weight Visual Mamba image Link Code
Arxiv 24.03.22 SiMBA: Simplified Mamba-based Architecture for Vision and Multivariate Time series image Link Code
Arxiv 24.03.26 PlainMamba: Improving Non-Hierarchical Mamba in Visual Recognition image Link Code

Related Survey

Date Paper Link
Arxiv 24.04.15 State Space Model for New-Generation Network Alternative to Transformers: A Survey Link
Arxiv 24.04.24 A Survey on Visual Mamba Link
Arxiv 24.04.24 Mamba-360: Survey of State Space Models as Transformer Alternative for Long Sequence Modelling: Methods, Applications, and Challenges Link
Arxiv 24.05.07 Vision Mamba: A Comprehensive Survey and Taxonomy Link

Vision Application

Image

Natural Image

Date Paper Figure Link Code Task
Arxiv 24.02.06 U-shaped Vision Mamba for Single Image Dehazing image Link Code Dehazing/Low Light Enhancement/Deraining
Arxiv 24.02.23 MambaIR: A Simple Baseline for Image Restoration with State-Space Model image Link Code Super-resolution/Denoising
Arxiv 24.03.04 MiM-ISTD: Mamba-in-Mamba for Efficient Infrared Small Target Detection image Link Code Infrared Image Segmentation
Arxiv 24.03.07 InstructGIE: Towards Generalizable Image Editing image Link Image Editing
Arxiv 24.03.13 Activating Wider Areas in Image Super-Resolution image Link Super-resolution
Arxiv 24.03.20 ZigMa: A DiT-style Zigzag Mamba Diffusion Model image Link Code Generation
Arxiv 24.03.27 Gamba: Marry Gaussian Splatting with Mamba for single view 3D reconstruction image Link 3D Reconstruction
Arxiv 24.04.09 MambaAD: Exploring State Space Models for Multi-class Unsupervised Anomaly Detection image Link code Anomaly Detection
Arxiv 24.04.11 DGMamba: Domain Generalization via Generalized State Space Model image Link Code Domain Generalization
Arxiv 24.04.15 FreqMamba: Viewing Mamba from a Frequency Perspective for Image Deraining image Link Deraining
Arxiv 24.04.17 CU-Mamba: Selective State Space Models with Channel Learning for Image Restoration image Link Denoising/Deblurring
Arxiv 24.04.22 MambaUIE: Unraveling the Ocean's Secrets with Only 2.8 FLOPs image Link Code Image Enhancement
Arxiv 24.05.03 FER-YOLO-Mamba: Facial Expression Detection and Classification Based on Selective State Space image Link Code Emotion recognition & Facial Expression Recognition & Detection
Arxiv 24.05.05 DVMSR: Distillated Vision Mamba for Efficient Super-Resolution image Link Code Super-Resolution
Arxiv 24.05.05 SMCD: High Realism Motion Style Transfer via Mamba-based Diffusion image Link Motion Style Transfer
Arxiv 24.05.06 Retinexmamba: Retinex-based Mamba for Low-light Image Enhancement image Link Code Image Enhancement
Arxiv 24.05.07 VMambaCC: A Visual State Space Model for Crowd Counting image Link Crowd Counting

Remote Sensing Image

Date Paper Figure Link Code Task
Arxiv 24.02.19 Pan-Mamba: Effective pan-sharpening with State Space Model image Link Code Pan-sharpening
Arxiv 24.03.28 RSMamba: Remote Sensing Image Classification with State Space Model image Link Code Remote Sensing Images Classification
Arxiv 24.04.02 Samba: Semantic Segmentation of Remotely Sensed Images with State Space Model image Link Code Semantic Segmentation
Arxiv 24.04.03 RS3Mamba: Visual State Space Model for Remote Sensing Images Semantic Segmentation image Link Code Semantic Segmentation
Arxiv 24.04.03 RS-Mamba for Large Remote Sensing Image Dense Prediction image Link Code Semantic Segmentation/Change Detection
Arxiv 24.04.04 ChangeMamba: Remote Sensing Change Detection with Spatio-Temporal State Space Model image Link Code Change Detection/Building Damage Assessment
Arxiv 24.04.12 SpectralMamba: Efficient Mamba for Hyperspectral Image Classification image Link Code Hyperspectral Image Classification
Arxiv 24.04.15 HSIDMamba: Exploring Bidirectional State-Space Models for Hyperspectral Denoising image Link Hyperspectral Denoising
Arxiv 24.04.28 S2Mamba: A Spatial-spectral State Space Model for Hyperspectral Image Classification image Link Code Hyperspectral Image Classification
Arxiv 24.04.29 Spectral-Spatial Mamba for Hyperspectral Image Classification image Link Hyperspectral Image Classification
Arxiv 24.05.06 SSUMamba: Spatial-Spectral Selective State Space Model for Hyperspectral Image Denoising image Link Code Hyperspectral Image Denoising
Arxiv 24.05.06 SOAR: Advancements in Small Body Object Detection for Aerial Imagery Using State Space Models and Programmable Gradients image Link Code Detection
Arxiv 24.05.08 Frequency-Assisted Mamba for Remote Sensing Image Super-Resolution image Link Super Resolution
Arxiv 24.05.13 GMSR:Gradient-Guided Mamba for Spectral Reconstruction from RGB Images image Link Code Spectral Reconstruction from RGB Images
Arxiv 24.05.14 Rethinking Scanning Strategies with Vision Mamba in Semantic Segmentation of Remote Sensing Imagery: An Experimental Study image Link Semantic Segmentation

Medical Image

Date Paper Figure Link Code Task
Arxiv 24.01.09 U-Mamba: Enhancing Long-range Dependency for Biomedical Image Segmentation image Link Code 2D Medical Segmentation/
3D Medical Segmentation
Arxiv 24.01.24 SegMamba: Long-range Sequential Modeling Mamba For 3D Medical Image Segmentation image Link Code 3D Medical Segmentation
Arxiv 24.02.04 VM-UNet: Vision Mamba UNet for Medical Image Segmentation image Link Code 2D Medical Segmentation
Arxiv 24.02.05 nnMamba: 3D Biomedical Image Segmentation, Classification and Landmark Detection with State Space Model image Link Code 3D Medical Segmentation
Arxiv 24.02.05 Swin-UMamba: Mamba-based UNet with ImageNet-based pretraining image Link Code 2D Medical Segmentation
Arxiv 24.02.07 Mamba-UNet: UNet-Like Pure Visual Mamba for Medical Image Segmentation image Link Code 2D Medical Segmentation
Arxiv 24.02.09 FD-Vision Mamba for Endoscopic Exposure Correction image Link Code Endoscopic Exposure Correction
Arxiv 24.02.11 Semi-Mamba-UNet: Pixel-Level Contrastive Cross-Supervised Visual Mamba-based UNet for Semi-Supervised Medical Image Segmentation image Link Code 2D Medical Segmentation
Arxiv 24.02.13 P-Mamba: Marrying Perona Malik Diffusion with Mamba for Efficient Pediatric Echocardiographic Left Ventricular Segmentation image Link 2D Medical Segmentation
Arxiv 24.02.16 Weak-Mamba-UNet: Visual Mamba Makes CNN and ViT Work Better for Scribble-based Medical Image Segmentation image Link Code 2D Medical Segmentation
Arxiv 24.02.28 MambaMIR: An Arbitrary-Masked Mamba for Joint Medical Image Reconstruction and Uncertainty Estimation image Link Code Medical Image Reconstruction/Uncertainty Estimation
Arxiv 24.03.06 MedMamba: Vision Mamba for Medical Image Classification image Link Code 2D Medical Classification
Arxiv 24.03.08 LightM-UNet: Mamba Assists in Lightweight UNet for Medical Image Segmentation image Link Code 2D Medical Segmentation/
3D Medical Segmentation
Arxiv 24.03.08 MamMIL: Multiple Instance Learning for Whole Slide Images with State Space Models image Link Cancer Subtyping
Arxiv 24.03.11 MambaMIL: Enhancing Long Sequence Modeling with Sequence Reordering in Computational Pathology image Link Code Cancer Subtyping/
Survival Prediction
Arxiv 24.03.12 Large Window-based Mamba UNet for Medical Image Segmentation: Beyond Convolution and Self-attention image Link Code 2D Medical Segmentation/
3D Medical Segmentation
Arxiv 24.03.13 MD-Dose: A Diffusion Model based on the Mamba for Radiotherapy Dose Prediction image Link Code Radiation Dose Prediction (Segmentation)
Arxiv 24.03.14 VM-UNET-V2 Rethinking Vision Mamba UNet for Medical Image Segmentation image Link Code 2D Medical Segmentation
Arxiv 24.03.20 H-vmunet: High-order Vision Mamba UNet for Medical Image Segmentation image Link Code 2D Medical Segmentation
Arxiv 24.03.20 ProMamba: Prompt-Mamba for polyp segmentation image Link 2D Medical Segmentation
Arxiv 24.03.25 CMViM: Contrastive Masked Vim Autoencoder for 3D Multi-modal Representation Learning for AD classification image Link Alzheimer’s disease Classification (CT/MRI)
Arxiv 24.03.26 Integrating Mamba Sequence Model and Hierarchical Upsampling Network for Accurate Semantic Segmentation of Multiple Sclerosis Legion image Link 2D Medical Segmentation (2D MRI)
Arxiv 24.03.26 Rotate to Scan: UNet-like Mamba with Triplet SSM Module for Medical Image Segmentation image Link 2D Medical Segmentation
Arxiv 24.03.29 UltraLight VM-UNet: Parallel Vision Mamba Significantly Reduces Parameters for Skin Lesion Segmentation image Link Code 2D Medical Segmentation
Arxiv 24.04.01 T-Mamba: Frequency-Enhanced Gated Long-Range Dependency for Tooth 3D CBCT Segmentation image Link Code 3D Medical Segmentation (Tooth)
Arxiv 24.04.10 ViM-UNet: Vision Mamba for Biomedical Segmentation image Link Code 2D Medical Segmentation (Cell/Neurite)
Arxiv 24.04.19 Vim4Path: Self-Supervised Vision Mamba for Histopathology Images image Link Code Cancer Subtyping
Arxiv 24.04.26 Optimizing Universal Lesion Segmentation: State Space Model-Guided Hierarchical Networks with Feature Importance Adjustment image Link Universal Lesion Segmentation
Arxiv 24.04.26 Sparse Reconstruction of Optical Doppler Tomography Based on State Space Model image Link ODT Sparse Reconstruction
Arxiv 24.05.05 AC-MAMBASEG: An adaptive convolution and Mamba-based architecture for enhanced skin lesion segmentation image Link Code Skin Lesion Segmentation
Arxiv 24.05.08 HC-Mamba: Vision MAMBA with Hybrid Convolutional Techniques for Medical Image Segmentation image Link 2D Medical Segmentation
Arxiv 24.05.09 VM-DDPM: Vision Mamba Diffusion for Medical Image Synthesis image Link Medical Image Generation

Video

Date Paper Figure Link Code Task
Arxiv 24.01.25 Vivim: a Video Vision Mamba for Medical Video Object Segmentation image Link Code Medical Video Segmentation
Arxiv 24.03.11 VideoMamba: State Space Model for Efficient Video Understanding image Link Code Action Recognition/Video Understanding/Text-to-video Retrieval
Arxiv 24.03.14 Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding image Link Code Action Recognition/Action Localization/...
Arxiv 24.04.09 RhythmMamba: Fast Remote Physiological Measurement with Arbitrary Length Videos image Link Code Remote photoplethysmography Prediction
Arxiv 24.04.11 Simba: Mamba augmented U-ShiftGCN for Skeletal Action Recognition in Videos image Link Skeleton Action Recognition
Arxiv 24.05.05 Matten: Video Generation with Mamba-Attention image Link Video generation

Point Cloud

Date Paper Figure Link Code Task
Arxiv 24.02.16 PointMamba: A Simple State Space Model for Point Cloud Analysis image Link Code Classification, Part Segmentation
Arxiv 24.03.01 Point Cloud Mamba: Point Cloud Learning via State Space Model image Link Code Classification, Part Segmentation, Semantic Segmentation
Arxiv 24.03.11 Point Mamba: A Novel Point Cloud Backbone Based on State Space Model with Octree-Based Ordering Strategy image Link Code Classification, Semantic Segmentation
Arxiv 24.04.08 3DMambaIPF: A State Space Model for Iterative Point Cloud Filtering via Differentiable Rendering image Link Point Cloud Filtering
Arxiv 24.04.10 3DMambaComplete: Exploring Structured State Space Model for Point Cloud Completion image Link Point Cloud Completion
Arxiv 24.04.23 Mamba3D: Enhancing Local Features for 3D Point Cloud Analysis via State Space Model image Link Classification, Part Segmentation
Arxiv 24.05.09 Rethinking Efficient and Effective Point-based Networks for Event Camera Classification and Regression: EventMamba image Link Classification, Regression
Arxiv 24.05.13 OverlapMamba: Novel Shift State Space Model for LiDAR-based Place Recognition image Link Code LiDAR Place Recognition

Multi-Modal

Date Paper Figure Link Code Task Modality
Arxiv 24.01.25 MambaMorph: a Mamba-based Framework for Medical MR-CT Deformable Registration image Link Code Registration MRI & CT
Arxiv 24.03.12 Motion Mamba: Efficient and Long Sequence Motion Generation with Hierarchical and Bidirectional Selective SSM image Link Code Text-to-Motion Generation Motion & Text
Arxiv 24.03.16 ReMamber: Referring Image Segmentation with Mamba Twister image Link Referring Image Segmentation Image & Text
Arxiv 24.03.20 VL-Mamba: Exploring State Space Models for Multimodal Learning image Link Code MLLM tasks Image & Text
Arxiv 24.03.21 Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference image Link Code MLLM tasks Image & Text
Arxiv 24.04.01 SpikeMba: Multi-Modal Spiking Saliency Mamba for Temporal Video Grounding image Link Temporal Video Grounding Video & Text
Arxiv 24.04.05 Sigma: Siamese Mamba Network for Multi-Modal Semantic Segmentation image Link Code Semantic Segmentation RGB Images & Depth/Thermal Images
Arxiv 24.04.07 VMambaMorph: a Multi-Modality Deformable Image Registration Framework based on Visual State Space Model with Cross-Scan Module image Link Code Registration MRI & CT
Arxiv 24.04.11 SurvMamba: State Space Model with Multi-grained Multi-modal Interaction for Survival Prediction image Link Cancer Subtyping/Survival Prediction WSIs & Gene
Arxiv 24.04.11 FusionMamba: Efficient Image Fusion with State Space Model image Link Pansharpening HISR Images & LRMS Images
Arxiv 24.04.12 MambaDFuse: A Mamba-based Dual-phase Model for Multi-modality Image Fusion image Link Multi-modality Image Fusion RGB & Thermal Images, MRI & CT/PET/SPECT
Arxiv 24.04.14 Fusion-Mamba for Cross-modality Object Detection image Link Visible-infrared Images Fusion RGB Images & Infrared Images
Arxiv 24.04.14 A Novel State Space Model with Local Enhancement and State Sharing for Image Fusion image Link Pansharpening HISR Images & LRMS Images
Arxiv 24.04.15 FusionMamba: Dynamic Feature Enhancement for Multimodal Image Fusion with Mamba image Link Code Image Fusion RGB & Infrared Images, MRI & CT/PET/SPECT, PC & GFP
Arxiv 24.04.17 Text-controlled Motion Mamba: Text-Instructed Temporal Grounding of Human Motion image Link Temporal Grounding Motion & Text
Arxiv 24.04.24 CFMW: Cross-modality Fusion Mamba for Multispectral Object Detection under Adverse Weather Conditions image Link Code Visible-infrared Images Fusion RGB Images & Infrared Images
Arxiv 24.04.27 Revisiting Multi-modal Emotion Learning with Broad State Space Models and Probability-guidance Fusion image Link Multi-modal Emotion Recognition Text & Video & Audio
Arxiv 24.04.28 Mamba-FETrack: Frame-Event Tracking via State Space Model image Link Code RGB-Event Tracking RGB Frames & Event
Arxiv 24.04.29 RSCaMa: Remote Sensing Image Change Captioning with State Space Model image Link Code Image Captioning Remote Sensing Image & Text
Arxiv 24.04.30 CLIP-Mamba: CLIP Pretrained Mamba Models with OOD and Hessian Evaluation image Link Code OOD Image & Text
Arxiv 24.05.13 Sakuga-42M Dataset: Scaling Up Cartoon Research image Link Code Cartoon Understanding/Cartoon Generation/... Cartoon Video & Text

Others

Date Paper Figure Link Code Task
Arxiv 24.02.24 Res-VMamba: Fine-Grained Food Category Visual Classification Using Selective State Space Models with Deep Residual Learning image Link Code Food Classification
Arxiv 24.03.08 Motion-Guided Dual-Camera Tracker for Low-Cost Skill Evaluation of Gastric Endoscopy image Link Code Endoscope Tip Tracking
Arxiv 24.03.14 MambaTalk: Efficient Holistic Gesture Synthesis with Selective State Space Models image Link Gesture Synthesis
Arxiv 24.03.15 On the low-shot transferability of [V]-Mamba? image Link Few-shot Learning
Arxiv 24.03.22 Music to Dance as Language Translation using Sequence Models image Link Code Music-to-Dance
Arxiv 24.05.08 Traj-LLM: A New Exploration for Empowering Trajectory Prediction with Pre-trained Large Language Models image Link Trajectory Prediction with LLM

Useful Source

Date Paper Link
Arxiv 24.03.03 The Hidden Attention of Mamba Models Link
Arxiv 24.03.16 Understanding Robustness of Visual State Space Models for Image Classification Link

Other Domains

coming soon

Reinforcement Learning

Graph Learning

MOE