- Softmax
- [] Attention
- [] Linear Layer
- [] RMSNorm
- [] GELU
The materials in this repository accompany the CUDA Training Series presented at ORNL and NERSC.
You can find the slides and presentation recordings at https://www.olcf.ornl.gov/cuda-training-series/
nvcc -arch=sm_75 --allow-unsupported-compiler vector_add.cu -o test && ./test
nvcc -arch=sm_75 --allow-unsupported-compiler vector_add.cu -o test && sudo /usr/local/cuda/bin/ncu --section SpeedOfLight --section MemoryWorkloadAnalysis ./test
nvcc -arch=sm_75 --allow-unsupported-compiler matrix_sums.cu -o test && sudo /usr/local/cuda/bin/ncu --metrics l1tex__t_sectors_pipe_lsu_mem_global_op_ld.sum,l1tex__t_requests_pipe_lsu_mem_global_op_ld.sum --section SpeedOfLight --section MemoryWorkloadAnalysis ./test