Why VM-UNet performance is less than standard UNet ?

Question

Why VM-UNet performance is less than standard UNet ?

Opened this issue 4 months ago · 5 comments

Hello, Thanks for sharing the code. I found that the results of other comparative experiments were exactly the same as some papers, such as MALUNet, Transunet, swinunet, etc. Furthermore, when I run UNet on the isic18 dataset (same data split 70 / 30), its performance （mIoU of 82.2% std +- 0.8%）much higher than VM-Unet. I don't know if there is something wrong ? 😢

Answer 1 · 2024-03-17T07:50:13.000Z

@FengheTan9
Hello, thank you for your interest in this work. I clicked on the link you provided and noticed that the UNet you used has a parameter count of 34.52M, which is significantly larger than the UNet used in the original article (about 7M). Therefore, it is reasonable that the UNet you used achieved good performance. Furthermore, this paper only explores the potential applications of SSM-based models in the medical segmentation field, rather than obtaining state-of-the-art results.

Answer 2 · 2024-03-17T08:24:07.000Z

@FengheTan9 Hello, thank you for your interest in this work. I clicked on the link you provided and noticed that the UNet you used has a parameter count of 34.52M, which is significantly larger than the UNet used in the original article (about 7M). Therefore, it is reasonable that the UNet you used achieved good performance. Furthermore, this paper only explores the potential applications of SSM-based models in the medical segmentation field, rather than obtaining state-of-the-art results.

Thank you for your reply. I agree with some of your points, but for example, some lightweight networks such as UNeXt, CMUNeXt or the Egeunet you previously published have higher performance than VM-UNet. Does this actually indicate that it is difficult for Mamba to model on sparse weak targets ?

Answer 3 · 2024-03-17T08:57:09.000Z

@FengheTan9
Hello, MALUNet and EGE-UNet are designed for the task of skin lesion segmentation, while VM-UNet belongs to a general medical image segmentation model. You can draw an analogy as follows: Swin Transformer -> Swin-UNet is analogous to VMamba -> VM-UNet. The core aim of this paper is to demonstrate that, similar to Swin, Mamba also shows feasibility in medical image tasks. This is emphasized in Section 4.3 of the paper: "For instance, our model surpasses Swin-UNet, which is the first pure Transformer-based model, by 1.95% and 2.34mm in DSC and HD95 metrics. The results demonstrate the superiority of the SSM-based model in medical image segmentation tasks."

Answer 4 · 2024-03-19T03:20:55.000Z

@FengheTan9 I agree with you

Answer 5 · 2024-03-19T03:21:42.000Z

Swin Transformer and Swin Unet have better performance