This repository includes Kyungmin's solution to the Triton Puzzles.
I conducted this puzzle in Google Colab.
I have found an error when using tl.dot()
in Puzzle 11 and Puzzle 12. This may be due to a version mismatch issue between the latest version of Triton and the puzzle. I recommend using the Triton version in the Jupyter Notebook.
w/ Tejas Ramesh and Keren Zhou based on Triton-Viz
Programming for accelerators such as GPUs is critical for modern AI systems. This often means programming directly in proprietary low-level languages such as CUDA. Triton is an alternative open-source language that allows you to code at a higher-level and compile to accelerators like GPU.
Coding for Triton is very similar to Numpy and PyTorch in both syntax and semantics. However, as a lower-level language there are a lot of details that you need to keep track of. In particular, one area that learners have trouble with is memory loading and storage which is critical for speed on low-level devices.
This set is puzzles is meant to teach you how to use Triton from first principles in an interactive fashion. You will start with trivial examples and build your way up to real algorithms like Flash Attention and Quantized neural networks. These puzzles do not need to run on GPU since they use a Triton interpreter.
Discord: https://discord.gg/cudamode #triton-puzzles
If you are into this kind of thing, this is 7th in a series of these puzzles.