Issues
- 1
[Question] why bias is init to zero?
#472 opened by michael8090 - 2
What does "prioritize teeth over education" even mean?
#489 opened by dw61 - 0
Hyperparameter Tuning
#484 opened by SinanCavusoglu - 1
- 1
MFU calculation wrong
#456 opened by lxww302 - 3
Citing this project in research
#471 opened by davmacario - 0
Shouldnt the ddp check be on ZERO instead of -1
#485 opened by sajinpgupta - 1
What is the meaning of nh and hs
#482 opened by Bachstelze - 0
Index out of range when training on custom dataset
#483 opened by TayTT - 3
Torch >= 2.2.0 inference issues on MPS
#458 opened by davmacario - 1
Implement multi-token prediction option for models
#479 opened by tmostak - 3
dropout is 0.0
#455 opened by dipsivenkatesh - 0
- 2
[Question] Why use `__call__` to do forward.
#475 opened by Felix-Zhenghao - 1
nanoGPT/model.py where `manual implementation of attention`,Is it correct to modify it like I did?
#478 opened by wmx-github - 1
- 3
Training fails on Python 3.12 on either GPU or CPU
#477 opened by tigran123 - 0
Recommendation for something smaller
#476 opened by diamondfishtools - 2
MFU too low in custom GPT-2 training
#466 opened by eonurk - 15
Is this loss curve normal
#468 opened by banyan-god - 0
[Question] The mask size seems wrong?
#473 opened by Felix-Zhenghao - 1
nothing has been written into???
#440 opened by BeimingCharles - 0
CUDA error: device-side assert triggered
#470 opened by ecsfu - 1
Why don't we crop attn.weight as well?
#447 opened by muerghq - 1
- 3
Resume Training
#467 opened by tiredsoul21 - 0
- 1
- 3
Sample from a subset of the token_embedding_table
#442 opened by PLarsen79 - 1
- 1
no cuda training does not work.
#460 opened by BurkenDev - 1
- 3
To reduce GPU memory usage & found a bug
#436 opened by cooper-him - 2
Why is there no mask when using flash attention?
#451 opened by bruce2233 - 2
Would like to contribute FSDP functionality
#448 opened by calmitchell617 - 0
- 0
get_lr needs to handle iter_num initialized to 0
#443 opened by yiphei - 1
AssertionError when trying to run sample.py
#439 opened by RexNecross - 1
i am getting encoding errors when i run the sample.py with any start contexts
#434 opened by danyuexiao - 1
Which Python version can be used
#438 opened by denghuilong-sir - 1
How to train nanoGPT using TPU's?
#435 opened by kathir-ks - 5
Question about how GPT learns to be "generative"
#432 opened by metalwhale - 1
position-wise mlp
#433 opened by amitlevy - 2
- 2
- 1
Question about vocab size
#421 opened by ArtHughes - 1
My own tokenizer
#422 opened by spcrobocar - 1
Meaning of teeth over education?
#424 opened by SmartManoj - 3
16 GPU per node
#423 opened by spcrobocar - 1
Exploding Gradient
#418 opened by huyz2023