/transformer_from_scratch

This repository is a implementation of the original "Attention is all you need" paper from Google. Tensorflow and associated tools have been used for the implementation

Transformer Architecture for Nueral Machine Translation from Scratch

This repository is a implementation of the original "Attention is all you need" paper from Google. The motivation for this short side project was to gather a in-depth understanding of how various parts of the transformer architecture interacts with each other. No better way to understand this than deep diving and recreating the architecture.

This understanding would further facilitate the understanding of more recent transformer based computer vision models such as ViT, DETR, 3DETR, etc.

Contents

1. Theory

2. Setup

3. Training

4. Inferencing

5. Visualizations and Results

References

[1] "Attention is All You Need" - Original Paper

[2] "Attention in transformers, visually explained" - 3blue1Brown

[3] "The Annotated Transformer" - Harvard NLP

[4] "Transformer: A Novel Nueral Network Architecture for Language Understanding" - Google

[6] "The illustrated transformer" - Jay Alammar

[7] "Illustrated Guide to Transformers Nueral Network" - The AI Hacker

[8] "How a transformer works at inference vs training time"

[9.] "Transformer architecture deepdive"