thu-ml/SageAttention
Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.
CudaApache-2.0
Stargazers
- AkaisoraniTsinghua University
- asdfghazh
- bennyguoTsinghua University
- curtis-sunTsinghua University
- dmarxStability.ai, Eleuther.ai
- DongHaowen
- dsying2022
- GYY000
- hyw498169842
- jeeveenn
- josecohenca
- jt-zhang@thu-ml, Tsinghua University
- kentaroy47Keio University
- LennyHuang15Tsinghua University
- NewBieForNow
- NJUZS
- numb3r3@jina-ai
- okotakuOrange
- pjq-mine
- pufanyiNanyang Technology University
- rcy17
- RucchaoChina
- SepariusUniversity of Bern
- SunnyXia3579
- tete9999
- TGLTommy
- TytRC
- u-brixton
- winterice0
- xiaguanWork for my dream
- xziayroZerolens
- ZAmbitiousQ
- zhengkw18Tsinghua University
- zihan369
- zsjie99
- zyang-16