High Performance LLMs 2024

Build a full scale, high-performance LLM from scratch in Jax! We cover training and inference, roofline analysis, compilation, sharding, profiling and more. You’ll leave the class comfortable in Jax and confident in your ability to design high-performance computing systems that reach near the physical limit.

Link to the Discord: https://discord.gg/2AWcVatVAw

Topics Covered

Build a high performance Jax LLM Implementation for training
Build a high performance Jax LLM Implementation for inference
Analyze Single Chip Rooflines And Compilation
Analyze Distributed Computing via Sharding
Optimize LLM Training – what happens under the hood, rooflines, sharding
Optimize LLM Inference – what happens under the hood, rooflines, sharding
Deep Dive into attention especialy fused attention schedules, running softmax and flash attention
Pallas - learn to optimize one lever deeper

Sessions, Slides, Videos and Take-Home Exercises

Session	Time	Link to join (or recording)	Slides	Take-Home Exercises	Summary
1	3:30PM US Pacific, 2/21/2024	Youtube recording	slides	link	end-to-end Jax LLM
2	3:30PM US Pacific, 2/28/2024	Youtube recording	slides	link	single chip perf and rooflines
3	3:30PM US Pacific, 3/13/2024	Youtube recording	slides	link	multi chip perf and rooflines, 1
4	3:30PM US Pacific, 3/20/2024	Youtube recording	slides	link	multi chip perf and rooflines, 2
5	3:30PM US Pacific, 3/27/2024	Youtube recording	slides	link	attention
6	3:30PM US Pacific, 4/10/2024	Youtube recording	slides	link	optimized training
7	3:30PM US Pacific, 4/24/2024	Youtube recording	slides	link	training e2e, inference analysis
8	3:30PM US Pacific, 5/08/2024	Youtube recording	slides	link	training xprof, mfu, naive inference
9	3:30PM US Pacific, 5/22/2024	Youtube recording	slides	link	efficient inference, numerics
10	3:30PM US Pacific, 5/29/2024	Youtube recording	slides	link	Pallas with Sharad Vikram!

(Session 10 was the last session! Thank you to everyone who joined us!)

About me: I’m Rafi Witten, a tech lead on Cloud TPU/GPU Multipod. We develop MaxText and aim to push the frontier on Perf/TCO. In 2023, we executed the "Largest ML Job" ever demonstrated in public and pioneered “Accurate Quantized Training”, a technique for training with 8-bit integers.

Contact me via Discord https://discord.gg/2AWcVatVAw

rwitten/HighPerfLLMs2024

High Performance LLMs 2024

Topics Covered

Sessions, Slides, Videos and Take-Home Exercises