This repository is a implementation of the original "Attention is all you need" paper from Google. The motivation for this short side project was to gather a in-depth understanding of how various parts of the transformer architecture interacts with each other. No better way to understand this than deep diving and recreating the architecture.
This understanding would further facilitate the understanding of more recent transformer based computer vision models such as ViT
, DETR
, 3DETR
, etc.
[1] "Attention is All You Need" - Original Paper
[2] "Attention in transformers, visually explained" - 3blue1Brown
[3] "The Annotated Transformer" - Harvard NLP
[4] "Transformer: A Novel Nueral Network Architecture for Language Understanding" - Google
[6] "The illustrated transformer" - Jay Alammar
[7] "Illustrated Guide to Transformers Nueral Network" - The AI Hacker