
This repository contains codes and models for the following papers:

Baijiong Lin, Weisen Jiang, Pengguang Chen, Yu Zhang, Shu Liu, and Ying-Cong Chen. MTMamba: Enhancing Multi-Task Dense Scene Understanding by Mamba-Based Decoders. In European Conference on Computer Vision, 2024.

Baijiong Lin, Weisen Jiang, Pengguang Chen, Shu Liu, and Ying-Cong Chen. MTMamba++: Enhancing Multi-Task Dense Scene Understanding via Mamba-Based Decoders. arXiv preprint arXiv:2408.15101, 2024.


  • PyTorch 2.0.0

  • timm 0.9.16

  • mmsegmentation 1.2.2

  • mamba-ssm 1.1.2

  • CUDA 11.8


  1. Prepare the pretrained Swin-Large checkpoint by running the following command

    cd pretrained_ckpts
    cd ../
  2. Download the data from PASCALContext.tar.gz, NYUDv2.tar.gz, and then extract them. You need to modify the dataset directory as db_root variable in configs/

  3. Train the model. Taking training NYUDv2 as an example, you can run the following command

    python -m torch.distributed.launch --nproc_per_node 8 --run_mode train --config_exp ./configs/mtmamba_nyud.yml 

        You can download the pretrained models from mtmamba_nyud.pth.tar, mtmamba_pascal.pth.tar, mtmamba_plus_nyud.pth.tar, mtmamba_plus_pascal.pth.tar.

  1. Evaluation. You can run the following command,

    python -m torch.distributed.launch --nproc_per_node 1 --run_mode infer --config_exp ./configs/mtmamba_nyud.yml --trained_model ./ckpts/mtmamba_nyud.pth.tar


We would like to thank the authors that release the public repositories: Multi-Task-Transformer, mamba, and VMamba.


