Issues
- 1
Found a way to speed up training another 4x
#21 opened by MizuleGPT - 1
Add the experimental results of Adafactor
#20 opened by jie040109 - 1
- 2
- 3
- 1
Possible explanation slower early steps
#19 opened by MarktHart - 2
Inference
#14 opened by HaiFengZeng - 3
Error running on 8xH100, works on 4xH100?
#15 opened by swookey-thinky - 2
Add LICENSE
#11 opened by linux-leo - 4
- 1
Refactor of CombinedOptimizer
#9 opened by askerlee - 1
Attention scale
#10 opened by alxndrTL - 5
- 0
Add requirements needed to run scripts
#4 opened by thomasahle - 1
Add optimizer and training steps
#5 opened by Navanit-git - 3
- 1
`rmsnorm` not trainable
#1 opened by ChenLi2049