feizc/Dimba

Question about the model design.

Closed this issue · 2 comments

Hi, this is a great work!
I would like to know why you use bidirectional Mamba? Does a single directional Mamba have any problems in your experiments?

feizc commented

Hi, It is generally believed that the single direction is not as good as the birectional Mamba. At the same time, different scan strategies can further improve the generationperformance, which can refer to the discussion in Zigma and DIM paper. For simplicity, we used bidirectional Mamba here.

However, it is worth noting that there has been an increasing focus of work on autoregression, such as llamagen [3] and Kaiming He' recent work [4].

[1] ZigMa: A DiT-style Zigzag Mamba Diffusion Model
[2] DiM: Diffusion Mamba for Efficient High-Resolution Image Synthesis
[3] Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation
[4] Autoregressive Image Generation without Vector Quantization