/applied-transformers

A playground-like experimental project to explore various transformer architectures from scratch.

Primary LanguageJupyter NotebookApache License 2.0Apache-2.0

Applied Transformers (PyTorch)

A playground-like experimental project to explore various transformer architectures from scratch.

Resources:

Intuitions:

  1. Intuition behind Attention Mechanism | Notebook
  2. Intuition behind individual Transformer Blocks | Notebook
  3. Intuition behind Chunked Cross-attention by RETRO Deepmind | Notebook

Implementations from Scratch:

Create virtual environment:

conda create -n applied-transformers python=3.10
conda activate applied-transformers

Install Dependencies:

pip install -r requirements.txt
  1. Transformer Model from Scratch {Vaswani et. al, 2017} | Dataset Sample | Python Code
# example training run
python transformer_architectures/vanilla/run.py --num_layers=5\
 --d_model=256 --d_ff=1024 --num_heads=4 --dropout=0.2 \
--train_path=<PATH_TO_TRAIN_DATASET>.csv  --valid_path=<PATH_TO_VALIDATION_DATASET>.csv
  1. GPT Model from Scratch {Radford et. al, 2018} | Coming Soon
  2. BERT Model from Scratch {Lewis et. al, 2019} | Coming Soon
  3. RETRO Model from Scratch {Borgeaud et. al, 2021} | Coming Soon
  4. BART Model from Scratch {Lewis et. al, 2019} | Coming Soon

TODO:

  • Text Generation Schemes
  • Text Generation Eval Metrics
  • Sequence Tokenization Algorithms
  • Optimized Einsum Implementation

References

  1. http://nlp.seas.harvard.edu/annotated-transformer/
  2. https://nn.labml.ai/transformers/models.html
  3. Transformers from scratch | CodeEmporium