When I was growing up Transformers were cars that turned into robots.
Now they're the backbone of every machine learning and AI app.
The goal of this repo will be to learn (for myself) and provide simple resources for others on*:
- The attention mechanism and the original Transformer architecture.
- Various Transformer-based models (e.g. GPT).
- The
transformers
library by Hugging Face (many different types of models here but why not?).
1 & 2 will be more research focused where as 3 will be very practically applicable.
*Outline subject to change.
Assumes basic knowledge of PyTorch (or any other ML framework) and deep learning in general.
See learnpytorch.io for a beginner-friendly intro.
Or my Learn PyTorch in a day video on YouTube to get up to speed and then come back here.
Some of the resources I've found useful (this will grow overtime).
- Transformer paper: https://arxiv.org/abs/1706.03762
- The annotated transformer: http://nlp.seas.harvard.edu/2018/04/01/attention.html
- Transformers from scratch: https://peterbloem.nl/blog/transformers
- Attention functions in PyTorch code: https://github.com/sooftware/attentions/blob/master/attentions.py
- xFormers by Facebook Research, an in-depth implementation of many Transformer Architecture components: https://github.com/facebookresearch/xformers
- https://lilianweng.github.io/posts/2018-06-24-attention/#self-attention
- https://jaykmody.com/blog/attention-intuition/
- Modifications to original Transformer architecture (warning: there are lots) - https://arxiv.org/abs/2102.11972