Issues
- 2
2D and 3D tile divisions so that permutation coordinates can be read from threadIdx and blockIdx
#406 opened by ChrisDryden - 12
Windows Github actions / workflow is successfully building including Cuda 12.4 builds
#229 opened by rosslwheeler - 5
- 0
Mismatch of dweight at layernorm_backward.cu
#428 opened by foreverpiano - 1
WikiText 103 evaluation
#246 opened by karpathy - 17
- 0
ThunderKittens Backend
#407 opened by AndreSlavescu - 3
Broader vendor support for hardware acceleration
#400 opened by ttraenkler - 1
compute sanitizers
#393 opened by ngc92 - 2
`make` fails to autodetect GPU compute capability
#387 opened by akulchik - 1
Is there a plan to support 8bits (FP8 or INT8)?
#391 opened by ifromeast - 9
MultiGPU training hangs
#369 opened by chinthysl - 0
Llm on small models
#379 opened by Mihir0567 - 3
more detailed explanation of Multi GPU
#373 opened by hafezmg48 - 15
- 0
How to do Inference on the trained weight of GPT 2 model after finishing the training on CPU using train_gpt2.py and train_gpt2 ?
#372 opened by asifshaikat - 14
Hardcoded block_size in kernels
#261 opened by azret - 6
- 2
What would be the main design trade-offs when re-implementing in clean modern C++?
#354 opened by mikeroberts3000 - 0
About pull request of custom kernel implementation
#356 opened by KarhouTam - 1
When will llama and other frameworks be supported?
#362 opened by MRQJsfhf - 3
- 2
delete use of cooperative groups in kernels
#292 opened by karpathy - 0
Possible bugs in the data loading functions
#321 opened by PeterZhizhin - 5
CI Mac issue with resources for Python?
#242 opened by rosslwheeler - 1
void tokenizer_init failed
#312 opened by Bing1002 - 2
Possible NULL Pointer Dereference
#308 opened by RootUp - 3
init from scratch
#243 opened by karpathy - 0
inf loss at big batch
#263 opened by karpathy - 6
bug: something goes wrong at larger batch sizes
#212 opened by karpathy - 0
cuda code that approaches cublas performance
#255 opened by nyck33 - 0
Refactoring all of the shared cuda helper methods to the shared common file
#245 opened by ChrisDryden - 3
[todo] Accumulate in double instead of float
#144 opened by karpathy - 1
from-scratch init the model
#154 opened by karpathy - 0
test_gpt2.cu correctness bounds tune per-parameter
#223 opened by karpathy - 0
Splitting cuda dev files to use smaller sizes for cpu validation compared to profiling
#244 opened by ChrisDryden - 2
Input token length question
#205 opened by kaizizzzzzz - 4
- 3
- 1
- 3
Suddenly "Out of memory" on train/python and train, test/CUDA on 4090
#206 opened by StoyanStAtanasov - 3
- 12
CPU: atomicAdd(float*)
#176 opened by azret - 1
use const properly, esp in function signatures
#147 opened by karpathy - 0
layernorm_backward.cu: atomicAdd
#190 opened by azret - 3
- 2
Understand this entire codebase with the help of a custom GPT [LLM-C by Andrej Karpathy]
#157 opened by ehzawad - 1
- 0
bt-invariant inference
#146 opened by karpathy - 0
concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.
#136 opened by xuanyuandy