mit-han-lab/Quest

[ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference

Cuda

Readme
7Issues
152Stargazers
3Watchers

Watchers

songhan
Sakits
Cambridge, MA
lliai

Contact site admin: Geeks.