Transformers

Author: Chelsea Zaloumis

Last update: 4/15/2021

A lecture-style exploration of transformers following Jay Alammar's post The Illustrated Transformer. Includes breakout questions and motivating examples.

Lecture objectives:

Motivation for Transformers
Define Transformers
Define Self-Attention
1. Self-Attention with vectors
2. Self-Attention with matrices
Define Multi-Head Attention
Define Encoder-Decoder Attention layer
Final Linear & Softmax Layers
Loss Function

References/Resources

Visualizing Machine Neural Translation
Neural Machine Translation by Jointly Learning to Align and Translate
Effective Approaches to Attention-based Neural Machine Translation
Attention Is All You Need
Illustrated Transformers
Creating Word Embeddings: Coding the Word2Vec Algorithm in Python using Deep Learning
GloVe: Global Vectors for Word Representation For word embedding weights
Generating Long Sequences with Sparse Transformers
Layer Normalization
Visual Information Theory

Further Work

Coding a basic transformer for natural language processing.
Coding a not-so-basic transformer for tbd application.

czaloumi/transformers

Transformers

Author: Chelsea Zaloumis

Lecture objectives:

References/Resources

Further Work