Is it possible to use selective scan in MLLA?
IceClear opened this issue · 4 comments
Hi, thanks for your wonderful work.
I have a small question related to the implementation of MLLA.
If my understanding is correct, MLLA can also be viewed as a kind of Mamba since they share a unified formulation.
So I guess some efficient implementations used by Mamba should also work for MLLA? Specifically, the selective scan used in Mamba can largely improve the efficiency of Mamba, so it should also work for MLLA? Is it possible to support it in MLLA?
Thanks and look forward to your reply.
Hi @IceClear, thanks for your insightful attention to our work.
As analyzed in our paper, Mamba has to employ recurrent calculation which unavoidably reduces model throughput. To address this, Mamba proposes a hardware-aware algorithm to speed up such computation. In contrast, our MLLA preserves parallelizable computation, i.e. matrix multiplication, which naturally benefits from fast inference speed, eliminating the need to utilize Mamba's efficient implementations.
Thanks for your quick and detailed response!
So if my understanding is correct, MLLA is more like a linear transformer but improved by effective design inspired by mamba?So it is not actually following the autoregressive paradigm.
That's right.
The equivalent forget gate endows Mamba with the autoregressive nature. We believe that such causal mode is not very suitable for vision tasks and replace the forget gate with proper positional encodings.
Noted. Thanks for your reply!