KellerJordan/modded-nanogpt

NanoGPT (124M) in 5 minutes

PythonMIT

Issues

Found a way to speed up training another 4x
#21 opened 8 days ago by MizuleGPT
1
Add the experimental results of Adafactor
#20 opened 8 days ago by jie040109
1
Model has 162M parameters, not 125M as expected
#26 opened 14 days ago by bluecoconut
1
Cant run any reproductions when using 8xH100 SXM5
#24 opened 15 days ago by bluecoconut
2
We don't have 8xH100, how about opening a new track?
#18 opened 17 days ago by Triang-jyed-driung
3
Possible explanation slower early steps
#19 opened 18 days ago by MarktHart
1
Inference
#14 opened a month ago by HaiFengZeng
2
Error running on 8xH100, works on 4xH100?
#15 opened 20 days ago by swookey-thinky
3
Add LICENSE
#11 opened 22 days ago by linux-leo
2
I tried it on 8 L4OS (GPU 45GB in RAM), it failed with OOM error.
#12 opened a month ago by jzkunlun
4
Refactor of CombinedOptimizer
#9 opened a month ago by askerlee
1
Attention scale
#10 opened a month ago by alxndrTL
1
Attempted to get this running on AMD mi300x...
#8 opened 2 months ago by jon-hotaisle
5
Add requirements needed to run scripts
#4 opened 3 months ago by thomasahle
0
Add optimizer and training steps
#5 opened 3 months ago by Navanit-git
1
did you forget to call torch.distributed.all_reduce function in train_gpt2.py
#2 opened 5 months ago by zhangfaen
3
`rmsnorm` not trainable
#1 opened 6 months ago by ChenLi2049
1