andrewcchu/dilated-self-attention
Implementation of the dilated self attention as described in "LongNet: Scaling Transformers to 1,000,000,000 Tokens"
PythonMIT
Stargazers
No one’s star this repository yet.
Implementation of the dilated self attention as described in "LongNet: Scaling Transformers to 1,000,000,000 Tokens"
PythonMIT
No one’s star this repository yet.