MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention

Question

griff4692 opened this issue 4 months ago · 0 comments

Implement this paper.

Similar to class KVCacheFastGen in that it involves a profiling step.