alexisrozhkov/dilated-self-attention

Implementation of the dilated self attention as described in "LongNet: Scaling Transformers to 1,000,000,000 Tokens"

PythonMIT

Issues

Large Text Compression Benchmark
#8 opened a year ago by jabowery
0
The conda environment.yml file I used
#7 opened a year ago by jabowery
0
Q: CPU Multicore?
#6 opened a year ago by jabowery
0
Integrating into a regular transformer
#5 opened a year ago by Akbarable
0