fkodom/dilated-attention-pytorch

(Unofficial) Implementation of dilated attention from "LongNet: Scaling Transformers to 1,000,000,000 Tokens" (https://arxiv.org/abs/2307.02486)

PythonMIT

Issues

ZeroDivisionError: integer division or modulo by zero
#7 opened 6 months ago by younesselbrag
1
Q: Attention Calculation
#5 opened a year ago by mohamedelbahnasawi
5
Backward pass
#6 opened 10 months ago by Coluding
3
Training on yet-another-retnet script
#4 opened a year ago by Akbarable
3
Benchmarking the MultiheadDilatedAttention Class
#3 opened a year ago by MHarris021
2
Running Time and Other Questions
#2 opened a year ago by MHarris021
10
Training
#1 opened a year ago by Akbarable
4