NVlabs/MambaVision

nn.Moudlelist

Closed this issue · 8 comments

Hi, Ali.
Recently, read your paper and code carefully
Thank you for your good architectural design, I want to use it for my research.
In that sense, I have a question, which may seem like a very simple question to you but is a little troubling to me.
I would like to ask you how to call your nn.Moudlelist or where to find your nn.Moudlelist concrete file.
I would appreciate your reply.

Hi @zhuangzexuan , thanks for raising this issue.

If you can please provide more information about how you would like to use this architecture. For example, what features are going to be extracted then I can help.

I assume you intend to use some or all parts of the networks via ModuleList.

In it's most simplistic format, you can add the following as a new file in the same folder that the model is defined and run inference (and other things):

from mamba_vision import *
import torch 

model = mamba_vision_T().cuda().eval()
resolution = 224
input_data = torch.randn((128, 3, resolution, resolution)).cuda()
output = model(input_data)

Ok, I appreciate your reply
I am trying to add the MambaVisionLayer to my network structure
Training when MambaVisionLayer does not use downsampling and uses conv instead of transformer
Next, I will try to use transformer, but I may not use yaml files of different models. I wonder if it is feasible to use yaml files and relevant parameters of different models in mamba_vision directly and put them into the network structure.

@zhuangzexuan In this case, we can simply import any of the layers from MambaVision architecture to be used along with other layers. For example, assuming that we want to use MambaVisionMixer in a new block with conv layers:

from mamba_vision import MambaVisionMixer
import torch.nn as nn


class MyMambaVisionModule(nn.Module):

    def __init__(self, dim):
        super().__init__()
        self.layers = nn.Sequential(
            nn.Conv2d(dim, dim, 3, 1, 1,
                      groups=dim, bias=False),
            nn.GELU(),
            self.mixer = MambaVisionMixer(d_model=dim, d_state=8, d_conv=3, expand=1)
            nn.Conv2d(dim, dim, 1, 1, 0, bias=False),
        )

    def forward(self, x):
        x = self.layers(x)
        return x

Any other layers or blocks in MambaVision can be imported and used in your architecture in the same way. I also prefer nn.Sequential since it has its own forward method and allows for more flexibility.

Thank you very much for your reply
I am making modifications according to what you said, the layers in MambaVision are very flexible and easy to use
Finally, thank you again for your reply and the code you provided. If there are any questions, I may bother you again. I hope you can understand

@zhuangzexuan great to know the provided snippet is somewhat useful. Looking forward to see what you build.

Thank you very much for your kind words

Hello, sorry to bother you again
I'd like to ask you about a recent problem
I'm doing a robotic grasping experiment
I set downsample=None using the MambaVisionLayer
I trained a hundred epochs at Cornell using transformer_block and conv. Why did I get 98.88 points with conv and 91 points with transformer_block