Infini-AI-Lab/MagicDec
Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding
JavaScriptApache-2.0
Issues
- 0
Question regarding the termination condition.
#3 opened by saeyoon17 - 3
Hanging on multiple GPU clusters
#2 opened by YJHMITWEB - 2
KV Loading Time
#1 opened by wutong4012