Issues
- 1
Token out of vocabulary at train_gpt2.cu:675
#786 opened by aidando73 - 1
- 2
MPI run with 8 GPU fails
#727 opened by msharmavikram - 2
- 0
Is there any way to make customized dataset?
#777 opened by dongrixinyu - 0
Online Softmax is wrong
#776 opened by NoSavedDATA - 0
Makefile incorrectly finds that `nccl` is installed for Linux systems with `libvncclclient`
#774 opened by leiDnedyA - 3
Problem when debugging cuda kernel functions.
#768 opened by dongrixinyu - 0
Question concerning `float4`
#767 opened by dongrixinyu - 0
Will this repo update new documentation later?
#759 opened by dongrixinyu - 2
llm.c for inference
#752 opened by ztachip - 1
- 0
Suggestion: Test more Activation Functions
#739 opened by linux-leo - 1
Can't train in FP16 on Turing
#747 opened by jafioti - 0
MPI run error
#729 opened by wzzanthony - 1
- 5
Pretraining (with CPUs)
#660 opened by bitmarkcc - 1
How to do Inference on the trained weight of GPT 2 model after finishing the training on CPU using train_gpt2.py and train_gpt2 ?
#372 opened by asifshaikat - 3
Suggestion: Use smollm corpus
#695 opened by linux-leo - 5
Cudnn error cudnn_att.cpp on train_gptcu
#492 opened by maderix - 2
- 0
- 0
Larger Tokenizers
#701 opened by dustinwloring1988 - 4
- 1
- 1
- 2
Specify torch version number in requirements.txt ?
#656 opened by Phil-U-U - 1
LLM.c in google colab
#562 opened by Eliah7 - 6
Modal script - benchmarking, profiling and libraries
#504 opened by vyom1611 - 0
- 2
BitNet (b1.58) support
#485 opened by EwoutH - 5
Broader vendor support for hardware acceleration
#400 opened by ttraenkler - 2
Is there a plan to support 8bits (FP8 or INT8)?
#391 opened by ifromeast - 2
- 1
- 4
more detailed explanation of Multi GPU
#373 opened by hafezmg48 - 3
`make` fails to autodetect GPU compute capability
#387 opened by aaakulchyk - 7
Running `quick start on CPU` on Macbook Pro M2
#563 opened by full-stack-ai - 3
- 0
I can not understand the `cublasGemmStridedBatchedEx` call in the `attention_forward`
#557 opened by echosprint - 4
ERROR on the AMD GPU
#527 opened by Lookforworld - 3
Model Export & Inference
#502 opened by karpathy - 3
- 0
Mismatch of dweight at layernorm_backward.cu
#428 opened by foreverpiano - 4
Deleting Conda/Python as a dependency entirely to dramatically decrease "latency to step"
#482 opened by karpathy - 2
python dev/data/fineweb.py --version 10B
#484 opened by bigsnarfdude - 3
2D and 3D tile divisions so that permutation coordinates can be read from threadIdx and blockIdx
#406 opened by ChrisDryden - 1
ThunderKittens Backend
#407 opened by AndreSlavescu - 1
compute sanitizers
#393 opened by ngc92 - 0
Llm on small models
#379 opened by Mihir0567