/Speculative-Decoding

Implementation of the paper Fast Inference from Transformers via Speculative Decoding, Leviathan et al. 2023.

Primary LanguagePythonMIT LicenseMIT

Watchers