/triton_tutorial

Tutorials for Triton, a language for writing gpu kernels

Primary LanguageJupyter Notebook

Triton tutorials

Triton is a language for writing GPU kernels. It's easier to use than CUDA, and interoperates well with PyTorch.

If you want to speed up PyTorch training or inference speed, you can try writing kernels for the heavier operations using Triton. (flash attention is a good example of a custom GPU kernel that speeds up training)

This repo has my notes as I learn to use Triton. They include a lot of code, and some discussion of the key concepts. They're geared towards people new to GPU programming and Triton.

Hopefully you will find them useful.

Contents

  1. GPU Basics
  2. Vector Addition
  3. Matrix Multiplication
  4. Softmax forward and backward
  5. Block matmul
  6. Matmul forward and backward

Install

To install Triton, just do pip install triton. You need a CUDA-compatible GPU with CUDA installed to use it.

References

Material in these notebooks came from the following sources (and they're generally good documentation):