shahrukhx01/applied-transformers

A playground-like experimental project to explore various transformer architectures from scratch.

Jupyter NotebookApache-2.0

Applied Transformers (PyTorch)

A playground-like experimental project to explore various transformer architectures from scratch.

Resources:

Intuitions:

Intuition behind Attention Mechanism | Notebook
Intuition behind individual Transformer Blocks | Notebook
Intuition behind Chunked Cross-attention by RETRO Deepmind | Notebook

Implementations from Scratch:

Create virtual environment:

conda create -n applied-transformers python=3.10
conda activate applied-transformers

Install Dependencies:

pip install -r requirements.txt

Transformer Model from Scratch {Vaswani et. al, 2017} | Dataset Sample | Python Code

# example training run
python transformer_architectures/vanilla/run.py --num_layers=5\
 --d_model=256 --d_ff=1024 --num_heads=4 --dropout=0.2 \
--train_path=<PATH_TO_TRAIN_DATASET>.csv  --valid_path=<PATH_TO_VALIDATION_DATASET>.csv

GPT Model from Scratch {Radford et. al, 2018} | Coming Soon
BERT Model from Scratch {Lewis et. al, 2019} | Coming Soon
RETRO Model from Scratch {Borgeaud et. al, 2021} | Coming Soon
BART Model from Scratch {Lewis et. al, 2019} | Coming Soon

TODO:

Text Generation Schemes
Text Generation Eval Metrics
Sequence Tokenization Algorithms
Optimized Einsum Implementation

References