
Question about the parameter delta

zzzack66 opened this issue · 1 comments

Thanks for your implementation of mamba-minimal. What a great job!
I'm really confused about the dimension of the parameter delta. I understand that delta is used for the discretization of A and B in SSM. However, I don't understand that why delta are first shaped in (b,l,dt_rank) and then project into (b,l,d_inner) as in the algorithm 2 of mamba in the paper. Why do we need to shape delta into (b,l,dt_rank) and then 𝜏Δ (Parameter+𝑠Δ (𝑥)).
(delta, B, C) = x_dbl.split(split_size=[self.args.dt_rank, n, n], dim=-1) # delta: (b, l, dt_rank). B, C: (b, l, n)
delta = F.softplus(self.dt_proj(delta)) # (b, l, d_in)
Can you explain the reason for this operation in the code? I'm looking forward to your reply.

I have same question。please explain to me if you have answer or advice.Thank you very much