- 3
bad result
#12 opened by Doctor-James - 5
attention masks in each layer
#5 opened by jiaosiyu1999 - 2
why drop the last token?
#11 opened by maxin-cn - 1
About bf16
#10 opened by Doctor-James - 1
How to Merge weights?
#9 opened by txchen-USTC - 7
How to load pre-trained model mamba_370m
#8 opened by Doctor-James - 1
- 5
about training details
#6 opened by maxin-cn - 1
Positional Encoding
#4 opened by Bezdarnost - 1
How to run it completely on cpu?
#2 opened by Meshwa428