Some experiments aimed at increasing LLM throughput and efficiency via Speculative Decoding.
Primary LanguagePython