My reimplementation of the SMT module for RL
- Scene Memory Transformer for Embodied Agents in Long-Horizon Tasks
- Self-Attention: A Better Building Block for Sentiment Analysis Neural Network Classifiers
- Attention Is All You Need
- Attention? Attention! (blog)
- Lilian Weng's implementation
- The Annotated Transformer (blog)
- The Illustrated Transformer (blog)