Experiments with the Mamba model from Dao and Gu using their implementation
First experiment is to train on the synthetic data induction heads task, and
visualize the
Experiments with the Mamba model from Dao and Gu using their implementation
First experiment is to train on the synthetic data induction heads task, and
visualize the