karpathy/llm.c

LLM training in simple, raw C/CUDA

CudaMIT

Issues

Token out of vocabulary at train_gpt2.cu:675
#786 opened a month ago by aidando73
1
hellaswag.py - Github Access to this site has been restricted.
#785 opened a month ago by aidando73
1
MPI run with 8 GPU fails
#727 opened 5 months ago by msharmavikram
2
[cudnn_frontend] Error: No execution plans support the graph.
#761 opened 3 months ago by Necktwi
2
Is there any way to make customized dataset?
#777 opened 2 months ago by dongrixinyu
0
Online Softmax is wrong
#776 opened 3 months ago by NoSavedDATA
0
Makefile incorrectly finds that `nccl` is installed for Linux systems with `libvncclclient`
#774 opened 3 months ago by leiDnedyA
0
Problem when debugging cuda kernel functions.
#768 opened 3 months ago by dongrixinyu
3
Question concerning `float4`
#767 opened 3 months ago by dongrixinyu
0
Will this repo update new documentation later?
#759 opened 3 months ago by dongrixinyu
0
llm.c for inference
#752 opened 4 months ago by ztachip
2
Error no instance of overloaded function "..." matches the argument list
#749 opened 4 months ago by drzsdrtfg
1
Suggestion: Test more Activation Functions
#739 opened 4 months ago by linux-leo
0
Can't train in FP16 on Turing
#747 opened 4 months ago by jafioti
1
MPI run error
#729 opened 4 months ago by wzzanthony
0
TypeError: normal_() got an unexpected keyword argument 'generator'
#723 opened 5 months ago by StarHtimE
1
Pretraining (with CPUs)
#660 opened 6 months ago by bitmarkcc
5
How to do Inference on the trained weight of GPT 2 model after finishing the training on CPU using train_gpt2.py and train_gpt2 ?
#372 opened 8 months ago by asifshaikat
1
Suggestion: Use smollm corpus
#695 opened 5 months ago by linux-leo
3
Cudnn error cudnn_att.cpp on train_gptcu
#492 opened 6 months ago by maderix
5
Windows issue with Cuda Toolkit 12.5 and latest MSVC compiler 17.10
#642 opened 5 months ago by rosslwheeler
2
Different batch_size results in different evaluation loss.
#710 opened 5 months ago by iminfine
0
Larger Tokenizers
#701 opened 5 months ago by dustinwloring1988
0
Getting "Floating point exception (core dumped)" Error
#687 opened 5 months ago by alvins82
4
image-gpt
#697 opened 5 months ago by bil-ash
1
Is Multi-GPU config enabled even when I'm using one GPU?
#692 opened 5 months ago by BlaiseMuhirwa
1
Specify torch version number in requirements.txt ?
#656 opened 6 months ago by Phil-U-U
2
LLM.c in google colab
#562 opened 6 months ago by Eliah7
1
Modal script - benchmarking, profiling and libraries
#504 opened 7 months ago by vyom1611
6
sel4 + llm.c > path to putting these llms in any mission critical system
#622 opened 6 months ago by torrmal
0
BitNet (b1.58) support
#485 opened 7 months ago by EwoutH
2
Broader vendor support for hardware acceleration
#400 opened 7 months ago by ttraenkler
5
Is there a plan to support 8bits (FP8 or INT8)?
#391 opened 7 months ago by ifromeast
2
is max_seq_len configurable or hardcoded parameter?
#569 opened 6 months ago by morphpiece
2
OSError: Memory mapping file failed: Cannot allocate memory
#566 opened 6 months ago by antonkratz
1
more detailed explanation of Multi GPU
#373 opened 7 months ago by hafezmg48
4
`make` fails to autodetect GPU compute capability
#387 opened 7 months ago by aaakulchyk
3
Running `quick start on CPU` on Macbook Pro M2
#563 opened 6 months ago by full-stack-ai
7
apparent compatibility issues with earlier c++ versions after recent pushes
#555 opened 7 months ago by hafezmg48
3
I can not understand the `cublasGemmStridedBatchedEx` call in the `attention_forward`
#557 opened 7 months ago by echosprint
0
ERROR on the AMD GPU
#527 opened 7 months ago by Lookforworld
4
Model Export & Inference
#502 opened 7 months ago by karpathy
3
Recalculating the activations in the backwards pass to conserve memory
#478 opened 7 months ago by ChrisDryden
3
Mismatch of dweight at layernorm_backward.cu
#428 opened 7 months ago by foreverpiano
0
Deleting Conda/Python as a dependency entirely to dramatically decrease "latency to step"
#482 opened 7 months ago by karpathy
4
python dev/data/fineweb.py --version 10B
#484 opened 7 months ago by bigsnarfdude
2
2D and 3D tile divisions so that permutation coordinates can be read from threadIdx and blockIdx
#406 opened 7 months ago by ChrisDryden
3
ThunderKittens Backend
#407 opened 7 months ago by AndreSlavescu
1
compute sanitizers
#393 opened 7 months ago by ngc92
1
Llm on small models
#379 opened 7 months ago by Mihir0567
0