thu-ml/SageAttention
Quantized Attention that achieves speedups of 2.1x and 2.7x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.
PythonBSD-3-Clause
Issues
- 1
- 1
- 2
- 1
Q matrix quantization
#27 opened - 6
- 0
- 1
Why divide ln 2 in quantiation Q value?
#24 opened - 3
- 2
Real accelerated benefits
#22 opened - 4
- 1
Can SageAttention available on AMD GPUs?
#20 opened - 5
exist nan when using sageattn
#19 opened - 1
Notation error in Equation (2)
#18 opened - 2
Would support other headdim
#17 opened - 0
Other SageAttention Kenerls
#16 opened - 0
- 1
遇到些兼容性问题
#14 opened - 1
Can you provide an example for LLaMA?
#13 opened - 1
Question about INT8 v.s. FP8
#12 opened - 2
SageAttention on ComfyUI
#11 opened - 2
Accuracy Comparson in Kernel Level
#10 opened - 3
is support stable diffusion?
#9 opened - 0
- 8
- 3
- 1
- 3
How can I make it work on Windows?
#4 opened - 6
Question about performance on A100
#3 opened - 2
BF16 q,k,v
#2 opened - 1
Example usage doesn't work
#1 opened