/Quest

[ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference

Primary LanguageCuda

Stargazers

No one’s star this repository yet.