Anybody observe slow training speed for mamba compared to transformer model?
vgthengane opened this issue · 2 comments
vgthengane commented
Anybody observe slow training speed for mamba compared to transformer model?
d62lu commented
yes, I just made the comparison. The speed of the Mamba block is truly slower than Transformer, under the same input dimension
d62lu commented
for both model training and inference. I am not looking into the details of the Mamba block, maybe I missed something in the code....