Pinned Repositories
lm-evaluation-harness
A framework for few-shot evaluation of language models.
MagicDec
Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding
MagicDec-part1
Speculative decoding for high-throughput long-context inference
MagicDec-part2
MagicDec: Breaking the Latency-Throughput Tradeoff for Long Contexts with Speculative Decoding
MagicPIG
MagicPIG: LSH Sampling for Efficient LLM Generation
Sequoia
scalable and robust tree-based speculative decoding algorithm
Sequoia-Page
Sirius
Sirius, an efficient correction mechanism, which significantly boosts Contextual Sparsity models on reasoning tasks while maintaining its efficiency gain.
TriForce
[COLM 2024] TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding
Infini-AI-Lab's Repositories
Infini-AI-Lab/Sequoia
scalable and robust tree-based speculative decoding algorithm
Infini-AI-Lab/TriForce
[COLM 2024] TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding
Infini-AI-Lab/MagicDec
Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding
Infini-AI-Lab/MagicPIG
MagicPIG: LSH Sampling for Efficient LLM Generation
Infini-AI-Lab/Sirius
Sirius, an efficient correction mechanism, which significantly boosts Contextual Sparsity models on reasoning tasks while maintaining its efficiency gain.
Infini-AI-Lab/lm-evaluation-harness
A framework for few-shot evaluation of language models.
Infini-AI-Lab/MagicDec-part1
Speculative decoding for high-throughput long-context inference
Infini-AI-Lab/MagicDec-part2
MagicDec: Breaking the Latency-Throughput Tradeoff for Long Contexts with Speculative Decoding
Infini-AI-Lab/Sequoia-Page