/minimal-transformer

A minimal decoder-only transformer implemented in under 50 lines of PyTorch.

Primary LanguagePython

Minimal Transformer

A minimal decoder-only transformer implemented in under 50 lines of PyTorch.

Purpose

Implementing the Transformer architecture can be challenging for beginners due to its use of non-trivial information flow (attention, causal masks etc). To this end, we offer a stripped down, "simple as possible" implementation of a decoder-only transformer for pedagogical purposes.