Infini-AI-Lab/MagicDec
Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding
JavaScriptApache-2.0
Issues
- 0
- 3
Hanging on multiple GPU clusters
#2 opened - 2
KV Loading Time
#1 opened
Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding
JavaScriptApache-2.0