/formal-algorithms-for-transformers

PyTorch implementation of algorithms in https://arxiv.org/abs/2207.09238

Primary LanguageJupyter Notebook

Formal Algorithms For Transformers

PyTorch implementation of transformer algorithms described in "Formal Algorithms for Transformers" by Mary Phuong and Marcus Hutter: https://arxiv.org/abs/2207.09238

Algorithm 1: Token embedding

Algorithm 2: Positional embedding

Algorithm 3: Basic single-query attention

Algorithm 4: π‘½Λœ ← Attention(𝑿, 𝒁|Wπ’’π’Œπ’—, Mask)

Algorithm 5: π‘½Λœ ← MHAttention(𝑿, 𝒁|W, Mask)

Algorithm 6: ˆ𝒆 ← layer_norm(𝒆|𝜸, 𝜷)

Algorithm 7: Unembedding.

Algorithm 8: 𝑷 ← EDTransformer(𝒛, 𝒙|𝜽)

Algorithm 9: 𝑷 ← ETransformer(𝒙|𝜽)

Algorithm 10: 𝑷 ← DTransformer(𝒙|𝜽)

Algorithm 11: πœ½Λ† ← EDTraining(𝒛1:𝑁data , 𝒙1:𝑁data , 𝜽)

Algorithm 12: πœ½Λ† ← ETraining(𝒙1:𝑁data , 𝜽)

Algorithm 13: πœ½Λ† ← DTraining(𝒙1:𝑁data , 𝜽)

Algorithm 14: π’š ← DInference(𝒙, πœ½Λ†)