This is the deeply, deeply awful research code underpinning Accelerating LLM Inference with Staged Speculative Decoding. Everything but the models are in the repo; the models are in the drive link below.
Models can be found here
Staged speculative decoding for small-batch LLM inference
PythonMIT
This is the deeply, deeply awful research code underpinning Accelerating LLM Inference with Staged Speculative Decoding. Everything but the models are in the repo; the models are in the drive link below.
Models can be found here