fkodom/dilated-attention-pytorch
(Unofficial) Implementation of dilated attention from "LongNet: Scaling Transformers to 1,000,000,000 Tokens" (https://arxiv.org/abs/2307.02486)
PythonMIT
Issues
- 1
- 5
Q: Attention Calculation
#5 opened by mohamedelbahnasawi - 3
Backward pass
#6 opened by Coluding - 3
Training on yet-another-retnet script
#4 opened by Akbarable - 2
- 10
Running Time and Other Questions
#2 opened by MHarris021 - 4