ccs96307/fast-llm-inference

Accelerating LLM inference with techniques like speculative decoding, quantization, and kernel fusion, focusing on implementing state-of-the-art research papers.

Python

Watchers

ccs96307
Taipei
drkostas
University of Tennessee, Knoxville