Is it possible to use selective scan in MLLA?

Question

Is it possible to use selective scan in MLLA?

IceClear opened this issue 10 months ago · 4 comments

Hi, thanks for your wonderful work.
I have a small question related to the implementation of MLLA.
If my understanding is correct, MLLA can also be viewed as a kind of Mamba since they share a unified formulation.
So I guess some efficient implementations used by Mamba should also work for MLLA? Specifically, the selective scan used in Mamba can largely improve the efficiency of Mamba, so it should also work for MLLA? Is it possible to support it in MLLA?
Thanks and look forward to your reply.

Answer 1 · 2024-06-09T07:12:33.000Z

Hi @IceClear, thanks for your insightful attention to our work.
As analyzed in our paper, Mamba has to employ recurrent calculation which unavoidably reduces model throughput. To address this, Mamba proposes a hardware-aware algorithm to speed up such computation. In contrast, our MLLA preserves parallelizable computation, i.e. matrix multiplication, which naturally benefits from fast inference speed, eliminating the need to utilize Mamba's efficient implementations.

Answer 2 · 2024-06-09T07:17:29.000Z

Thanks for your quick and detailed response!
So if my understanding is correct, MLLA is more like a linear transformer but improved by effective design inspired by mamba？So it is not actually following the autoregressive paradigm.

Answer 3 · 2024-06-09T07:21:58.000Z

That's right.
The equivalent forget gate endows Mamba with the autoregressive nature. We believe that such causal mode is not very suitable for vision tasks and replace the forget gate with proper positional encodings.

Answer 4 · 2024-06-09T08:29:46.000Z

Noted. Thanks for your reply！