ML Notes
Documentation
- Transformer: Attention Is All You Need
- Reformer: The Efficient Transformer (Local Attention with Hash / Sphere)
- Linformer: Self-Attention with Linear Complexity (
O(n^2)
=>O(n)
) - Vision Transformer: An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
- DeiT: Training data-efficient image transformers & distillation through attention: Distill token and huge CNN teachers
- CaiT: Going deeper with Image Transformers
- DETR: End-to-End Object Detection with Transformers
- RevNet: The Reversible Residual Network – Backpropagation Without Storing Activations
- PEER: Plan, Edit, Explain, Repeat. A collaborative language model from Meta AI
- Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models