Pinned Repositories
DejaVu
FlexLLMGen
Running large language models on a single GPU for throughput-oriented scenarios.
H2O
[NeurIPS'23] H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models.
Foundation Model Inference's Repositories
FMInference/FlexLLMGen
Running large language models on a single GPU for throughput-oriented scenarios.
FMInference/H2O
[NeurIPS'23] H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models.
FMInference/DejaVu