mit-han-lab/Quest
[ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference
Cuda
Stargazers
- Aaronhuang-778HKU - CVMI Lab (https://xjqi.github.io/cvmi.html)
- buaabaiBUAA
- cat538
- cblmemoUC Berkeley
- ConlessShanghai Jiao Tong University
- cylinbaoUniversity of Washington
- DD-DuDaData Science and Analytic Thrust, Information Hub, HKUST(GZ)
- dmarxStability.ai, Eleuther.ai
- fangtaosongUniversity of Chinese Academy of Sciences, NLP LAB
- ganlerUniversity of Illinois Urbana-Champaign
- happierpigUC Berkeley
- huskydogeShanghai Jiao Tong University
- JerryYin777University of Minnesota Twin Cities
- Joeyzhouqihui
- junhua-lShenzhen
- kaizizzzzzzIthaca, New York
- kiminh
- lambda7xxShanghai Jiao Tong University
- lijie2160北京交通大学
- lmxyyMassachusetts Institute of Technology
- lvdongxuShanghai Jiao Tong University
- pprpData Science and Analytic Thrust, Information Hub, HKUST(GZ)
- rentainheIDEA
- SakitsMIT, EECS
- senlyu163
- SiriusNEOTsinghua University
- snakeztcZhejiang University
- Ther-nullptrTsinghua University @eesast
- Tom-CaoZHXidian University
- UbospicaShanghai, China
- Xiuyu-LiUC Berkeley
- YangWang92
- Ying1123Stanford University
- Yuan-ManXShanghai, China
- YukeWang96University of California, Santa Barbara
- zwhong714Shanghai Jiao Tong University