Simple LLM in pure CUDA C

To Do

Softmax
[] Attention
[] Linear Layer
[] RMSNorm
[] GELU

Learning Material

The materials in this repository accompany the CUDA Training Series presented at ORNL and NERSC.

You can find the slides and presentation recordings at https://www.olcf.ornl.gov/cuda-training-series/

**1. Compile Code*

nvcc -arch=sm_75 --allow-unsupported-compiler vector_add.cu -o test && ./test

2. Profiling Experiments

nvcc -arch=sm_75 --allow-unsupported-compiler vector_add.cu -o test && sudo /usr/local/cuda/bin/ncu --section SpeedOfLight --section MemoryWorkloadAnalysis ./test

nvcc -arch=sm_75 --allow-unsupported-compiler matrix_sums.cu -o test && sudo /usr/local/cuda/bin/ncu --metrics l1tex__t_sectors_pipe_lsu_mem_global_op_ld.sum,l1tex__t_requests_pipe_lsu_mem_global_op_ld.sum --section SpeedOfLight --section MemoryWorkloadAnalysis ./test

basujindal/learn_cuda

Simple LLM in pure CUDA C

To Do

Learning Material

*1. Compile Code

2. Profiling Experiments

**1. Compile Code*