Transformers & LLMs cheatsheet for Stanford's CME 295

Available in العربية - Čeština - English - Español - فارسی - Français - Italiano - 日本語 - 한국어 - ไทย - Türkçe - 中文

Goal

This repository aims at summing up in the same place all the important notions that are covered in Stanford's CME 295 Transformers & Large Language Models course. It includes:

Transformers: self-attention, architecture, variants, optimization techniques (sparse attention, low-rank attention, flash attention)
LLMs: prompting, finetuning (SFT, LoRA), preference tuning, optimization techniques (mixture of experts, distillation, quantization)
Applications: LLM-as-a-judge, RAG, agents, reasoning models (train-time and test-time scaling from DeepSeek-R1)

Content

VIP Cheatsheet

Class textbook

This VIP cheatsheet gives an overview of what is in the "Super Study Guide: Transformers & Large Language Models" book, which contains ~600 illustrations over 250 pages and goes into the following concepts in depth. You can find more details at https://superstudy.guide.

Class website

cme295.stanford.edu

Authors

Afshine Amidi (Ecole Centrale Paris, MIT) and Shervine Amidi (Ecole Centrale Paris, Stanford University)

jcvikl/stanford-cme-295-transformers-large-language-models