speculative-decoding

There are 33 repositories under speculative-decoding topic.

intel/intel-extension-for-transformers
⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡
Language:Python2.2k 26 166215
SafeAILab/EAGLE
Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3.
Language:Python1.8k 27 257206
aphrodite-engine/aphrodite-engine
Large-scale LLM inference engine
Language:C++1.5k 23 238168
Infini-AI-Lab/Sequoia
scalable and robust tree-based speculative decoding algorithm
Language:Python358 5 1238
facebookresearch/LayerSkip
Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024
Language:Python335 8 1223
Infini-AI-Lab/TriForce
[COLM 2024] TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding
Language:Python263 2 1217
FasterDecoding/REST
REST: Retrieval-Based Speculative Decoding, NAACL 2024
Language:C208 7 2517
Infini-AI-Lab/UMbreLLa
LLM Inference on consumer devices
Language:Python124 4 1415
bigai-nlco/TokenSwift
[ICML 2025] |TokenSwift: Lossless Acceleration of Ultra Long Sequence Generation
Language:Python113 2 310
kssteven418/BigLittleDecoder
[NeurIPS'23] Speculative Decoding with Big Little Decoder
Language:Python94 5 411
romsto/Speculative-Decoding
Implementation of the paper Fast Inference from Transformers via Speculative Decoding, Leviathan et al. 2023.
Language:Python78 2 28
hemingkx/SWIFT
[ICLR 2025] SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration
Language:Python45 3 11
hemingkx/SpecDec
Codes for our paper "Speculative Decoding: Exploiting Speculative Execution for Accelerating Seq2seq Generation" (EMNLP 2023 Findings)
Language:Python44 2 31
AutonomicPerfectionist/PipeInfer
PipeInfer: Accelerating LLM Inference using Asynchronous Pipelined Speculation
Language:C++29 3 14
mscheong01/speculative_decoding.c
minimal C implementation of speculative decoding based on llama2.c
Language:C25 2 12
BaohaoLiao/RSD
Reward-guided Speculative Decoding (RSD) for efficiency and effectiveness.
Language:Python23 2 13
jadohu/LANTERN
Official Implementation of LANTERN (ICLR'25) and LANTERN++(ICLRW-SCOPE'25)
Language:Python170
Geralt-Targaryen/Awesome-Speculative-Decoding
Reading notes on Speculative Decoding papers
160
ccs96307/fast-llm-inference
Accelerating LLM inference with techniques like speculative decoding, quantization, and kernel fusion, focusing on implementing state-of-the-art research papers.
Language:Python10 1 01
hsj576/GRIFFIN
Official Implementation of "GRIFFIN: Effective Token Alignment for Faster Speculative Decoding"
Language:Python60
smpanaro/token-recycling
Unofficial implementation of Token Recycling self-speculative decoding method.
Language:Python5 1 00
wtlow003/speculative-sampling
Implementation of Speculative Sampling in "Accelerating Large Language Model Decoding with Speculative Sampling"
Language:Python5 1 01
jayeshthk/SpS-SpecDec
SpS-SpecDec: a fast Python lib that boosts autoregressive LM inference with speculative decoding. Inspired by DeepMind, it guesses multiple tokens using a small draft model, verifies with a big one. Get 2-2.5x speedups, no quality drop!
Language:Python21
pinqian77/Dynasurge
Dynasurge: Dynamic Tree Speculation for Prompt-Specific Decoding
Language:Python2 1 00
PopoDev/BiLD
Reproducibility Project for [NeurIPS'23] Speculative Decoding with Big Little Decoder
Language:Python2 2 00
u-hyszk/japanese-speculative-decoding
Verification of the effect of speculative decoding in Japanese.
Language:Python2 2 00
haukzero/Speculative-Demo
一个简单的投机推理实现
Language:Python1 1 00
jayeshthk/speculative-decoding-inference
Speculative decoding challenge by anysphere(cursor AI).
Language:Jupyter Notebook1
natask/infra_gpu_hack
A novel algorithm that integrates a text, diffusion LLM as a draft model to boost the performance of traditional auto-regressive LLMs.
Language:Python1 1 0
realjules/align_llm
The LLM Defense Framework enhances large language model security through post-processing defenses and statistical guarantees based on one-class SVM. It combines advanced sampling methods with adaptive policy updates and comprehensive evaluation metrics, providing researchers and practitioners with tools to build more secure AI systems.
Language:Python10
kinshukdua/SpecDec
Some experiments aimed at increasing LLM throughput and efficiency via Speculative Decoding.
Language:Python0 1 00
majid-daliri/DISD
Coupling without Communication and Drafter-Invariant Speculative Decoding
Language:Python0 1 00
wtlow003/ngram-decoding
(Re)-implementation of "Prompt Lookup Decoding" by Apoorv Saxena, with extended ideas from LLMA Decoding.
Language:Jupyter Notebook0 1 00

speculative-decoding

intel/intel-extension-for-transformers

SafeAILab/EAGLE

aphrodite-engine/aphrodite-engine

Infini-AI-Lab/Sequoia

facebookresearch/LayerSkip

Infini-AI-Lab/TriForce

FasterDecoding/REST

Infini-AI-Lab/UMbreLLa

bigai-nlco/TokenSwift

kssteven418/BigLittleDecoder

romsto/Speculative-Decoding

hemingkx/SWIFT

hemingkx/SpecDec

AutonomicPerfectionist/PipeInfer

mscheong01/speculative_decoding.c

BaohaoLiao/RSD

jadohu/LANTERN

Geralt-Targaryen/Awesome-Speculative-Decoding

ccs96307/fast-llm-inference

hsj576/GRIFFIN

smpanaro/token-recycling

wtlow003/speculative-sampling

jayeshthk/SpS-SpecDec

pinqian77/Dynasurge

PopoDev/BiLD

u-hyszk/japanese-speculative-decoding

haukzero/Speculative-Demo

jayeshthk/speculative-decoding-inference

natask/infra_gpu_hack

realjules/align_llm

kinshukdua/SpecDec

majid-daliri/DISD

wtlow003/ngram-decoding