karpathy/nanoGPT

The simplest, fastest repository for training/finetuning medium-sized GPTs.

PythonMIT

Issues

[Question] why bias is init to zero?
#472 opened in 2 hours by michael8090
1
What does "prioritize teeth over education" even mean?
#489 opened 7 days ago by dw61
2
Hyperparameter Tuning
#484 opened a day ago by SinanCavusoglu
0
[Q] Async prefetch next batch while model is doing forward pass
#486 opened 18 days ago by GM-git-dotcom
1
MFU calculation wrong
#456 opened 3 months ago by lxww302
1
Citing this project in research
#471 opened 2 months ago by davmacario
3
Shouldnt the ddp check be on ZERO instead of -1
#485 opened 20 days ago by sajinpgupta
0
What is the meaning of nh and hs
#482 opened 25 days ago by Bachstelze
1
Index out of range when training on custom dataset
#483 opened 24 days ago by TayTT
0
Torch >= 2.2.0 inference issues on MPS
#458 opened 3 months ago by davmacario
3
Implement multi-token prediction option for models
#479 opened a month ago by tmostak
1
dropout is 0.0
#455 opened 3 months ago by dipsivenkatesh
3
neverMind
#480 opened a month ago by Zemulax
0
[Question] Why use `__call__` to do forward.
#475 opened a month ago by Felix-Zhenghao
2
nanoGPT/model.py where `manual implementation of attention`,Is it correct to modify it like I did？
#478 opened a month ago by wmx-github
1
could nanoGPT be the AI assistant for the development of CAX software?
#474 opened 2 months ago by fengsim
1
Training fails on Python 3.12 on either GPU or CPU
#477 opened a month ago by tigran123
3
Recommendation for something smaller
#476 opened 2 months ago by diamondfishtools
0
MFU too low in custom GPT-2 training
#466 opened 2 months ago by eonurk
2
Is this loss curve normal
#468 opened 2 months ago by banyan-god
15
[Question] The mask size seems wrong?
#473 opened 2 months ago by Felix-Zhenghao
0
nothing has been written into???
#440 opened 4 months ago by BeimingCharles
1
CUDA error: device-side assert triggered
#470 opened 2 months ago by ecsfu
0
Why don't we crop attn.weight as well?
#447 opened 3 months ago by muerghq
1
How to Set "vocab_size" and "block_size" for Word Embedding?
#469 opened 2 months ago by haibao-yu
1
Resume Training
#467 opened 2 months ago by tiredsoul21
3
nano_gpt
#465 opened 3 months ago by Mihir0567
0
Why do we need further pretrain given the loss is already converged
#457 opened 3 months ago by BiEchi
1
Sample from a subset of the token_embedding_table
#442 opened 4 months ago by PLarsen79
3
Training loss converges much earlier compared to max_iters
#461 opened 3 months ago by goswamig
1
no cuda training does not work.
#460 opened 3 months ago by BurkenDev
1
Where are the correspondent codes to the alpha, beta, miu and sigma?
#417 opened 5 months ago by fishfree
1
To reduce GPU memory usage & found a bug
#436 opened 4 months ago by cooper-him
3
Why is there no mask when using flash attention?
#451 opened 3 months ago by bruce2233
2
Would like to contribute FSDP functionality
#448 opened 3 months ago by calmitchell617
2
Question about causal masking vs full-context auto-regressive masking
#445 opened 3 months ago by pi-tau
0
get_lr needs to handle iter_num initialized to 0
#443 opened 3 months ago by yiphei
0
AssertionError when trying to run sample.py
#439 opened 4 months ago by RexNecross
1
i am getting encoding errors when i run the sample.py with any start contexts
#434 opened 4 months ago by danyuexiao
1
Which Python version can be used
#438 opened 4 months ago by denghuilong-sir
1
How to train nanoGPT using TPU's?
#435 opened 4 months ago by kathir-ks
1
Question about how GPT learns to be "generative"
#432 opened 4 months ago by metalwhale
5
position-wise mlp
#433 opened 4 months ago by amitlevy
1
The function get_lr is not being utilized in the train.py
#425 opened 4 months ago by aistream69
2
Setting RNG state while looping through model generate, Reproducibility.
#426 opened 5 months ago by ArtHughes
2
Question about vocab size
#421 opened 5 months ago by ArtHughes
1
My own tokenizer
#422 opened 5 months ago by spcrobocar
1
Meaning of teeth over education?
#424 opened 5 months ago by SmartManoj
1
16 GPU per node
#423 opened 5 months ago by spcrobocar
3
Exploding Gradient
#418 opened 5 months ago by huyz2023
1