AnswerDotAI/cold-compress

MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention

griff4692 opened this issue · 0 comments

Implement this paper.

Similar to class KVCacheFastGen in that it involves a profiling step.