Multi-Candidate Speculative Decoding

Code Release

Data Release

For Alpaca dataset, we use exactly the same exact source as SpecInfer.

For the WMT dataset, we follow the process of SpecInfer: randomly sampling 1000 samples from the test set. We wrap the source sentences using the following template:

Translate the input English sentence into German.
Input: {source sentence}
Output:

Model Release

We release our fine-tuned draft models on hugginface, see Vicuna-68M and Vicuna-160M. They are fine-tuned from LLaMA-68M and LLaMA-160M respectively on ShareGPT data. The training setup follows FastChat.

NJUNLP/MCSD

Multi-Candidate Speculative Decoding

Code Release

Data Release

Model Release