Helpful:
- https://github.com/tpn/cuda-by-example/blob/master/chapter05/dot.cu (cuda by example book)
- PMPP textbook
- CUDA reference pdf
- https://siboehm.com/articles/22/CUDA-MMM
- vllm codebase
- pytorch documentation page on creating CUDA kernels with bindings