alexisrozhkov/dilated-self-attention
Implementation of the dilated self attention as described in "LongNet: Scaling Transformers to 1,000,000,000 Tokens"
PythonMIT
Issues
- 0
Large Text Compression Benchmark
#8 opened by jabowery - 0
The conda environment.yml file I used
#7 opened by jabowery - 0
Q: CPU Multicore?
#6 opened by jabowery - 0
Integrating into a regular transformer
#5 opened by Akbarable