Triton tutorials

Triton is a language for writing GPU kernels. It's easier to use than CUDA, and interoperates well with PyTorch.

If you want to speed up PyTorch training or inference speed, you can try writing kernels for the heavier operations using Triton. (flash attention is a good example of a custom GPU kernel that speeds up training)

This repo has my notes as I learn to use Triton. They include a lot of code, and some discussion of the key concepts. They're geared towards people new to GPU programming and Triton.

Hopefully you will find them useful.

GPU Basics
Vector Addition
Matrix Multiplication
Softmax forward and backward
Block matmul
Matmul forward and backward

Install

To install Triton, just do pip install triton. You need a CUDA-compatible GPU with CUDA installed to use it.

References

Material in these notebooks came from the following sources (and they're generally good documentation):

Triton tutorials
NVIDIA CUDA guide
CUDA MMM

VikParuchuri/triton_tutorial

Triton tutorials

Contents

Install

References