karpathy/build-nanogpt

Video+code lecture on building nanoGPT from scratch

Python

Issues

How Can I extract Last Layer Representation?
#86 opened 2 months ago by shantanu778
0
How is the autoregressive loss handled?
#82 opened 3 months ago by BabyCNM
2
NO dropout in MLP and CausalSelfAttention
#29 opened 4 months ago by peter-ni-noob
2
Avoid tiktoken.decode panic on unknown tokens.
#81 opened 4 months ago by IggShaman
0
torch.compile-d models do not work with example generation and hellaswag eval
#79 opened 4 months ago by IggShaman
0
TTS
#67 opened 5 months ago by yukiarimo
8
Cannot get the log file "log124M_40B/log.txt"?
#47 opened 6 months ago by dtdo90
5
Fix torch.compile Issue - Error with HellaSwag eval and Generation
#60 opened 5 months ago by ML-Guy
0
Text generation can use raw_model instead of model
#56 opened 5 months ago by sapphire008
0
Running codes on Windows issues
#45 opened 6 months ago by gerardaristizabalpla4
2
Sharding the dataset not completing?
#25 opened 6 months ago by dustinwloring1988
7
How to support padding in the train dataset for training ?
#49 opened 6 months ago by mrhimanshu
2
Integrating GPT-2 with deepspeed Zero-1, Zero-2 and Zero-3
#48 opened 6 months ago by Devadeut
1
Different inference results between flash attention and manually implemented attention appeared.
#50 opened 6 months ago by Jaeckel-d
0
RuntimeError: User specified an unsupported autocast device_type 'cuda:0'
#44 opened 6 months ago by 0smboy
1
Executing with 1 GPU raises "OutOfMemory Exception", executing with 2 GPUs "RuntimeError: CUDA error: invalid device ordinal"
#41 opened 6 months ago by nmerkle
2
Is dataloader making optimal batches?
#31 opened 6 months ago by paraschopra
1
Implement tensor parallelism
#17 opened 6 months ago by marib00
4
Consider using `torch.compile(model, fullgraph=True, mode="reduce-overhead")`
#6 opened 6 months ago by lezcano
11
Chunking method in the original GPT-2 training dataset
#22 opened 6 months ago by rasbt
2
Embeddings are initialized with std of 0.02
#18 opened 6 months ago by eryk-mazus
2