/fast-llm-inference

Accelerating LLM inference with techniques like speculative decoding, quantization, and kernel fusion, focusing on implementing state-of-the-art research papers.

Primary LanguagePython

Watchers