Implements the position embeddings from RoFormer: Enhanced Transformer with Rotary Position Embedding for Flax. rotary-embedding-torch was used as a reference implementation.
Features:
- 1D (for sequence models) and 2D axial (for ViT) rotary embeddings.
- Learnable frequencies, including separate learnable frequencies per attention head.
Coming soon