thu-ml/SageAttention
Quantized Attention that achieves speedups of 2.1x and 2.7x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.
PythonBSD-3-Clause
Stargazers
- mehamednews
- philiporangeLeamington Spa, UK
- LKONE-1019
- n9Mtq4Massachusetts
- hnjylwbBeijing, China
- 949020497Bei Jing
- hppanev
- a-r-r-o-w
- jinnsp
- TrickshotblasterSeattle
- SakitsCambridge, MA
- HuangOwenSanta Monica, CA
- Xiuyu-Li
- fredi-pythonGermany
- NileGraddis
- lkwq007Hong Kong
- exceedzhangShanghai
- ZiYang-xieLos Angeles-CA
- THINK2TRY
- jason-huang03Beijing, China
- planb788
- yyk808
- ChenyangZhang-DB
- RadioheadingShanghai, China
- jturner116Toulouse, France
- chris-fengtian-guo
- Tom-CaoZHChina
- konradre
- salma2vec
- sneccc
- Youhe-Jiang
- Ryu1845
- DD-DuDaGuangZhou
- diligentliu
- lotus-0216
- FanYang98