Awesome-DB4LLM Inference system [ SOSP 2023 ] Efficient Memory Management for Large Language Model Serving with PagedAttention (vllm) [paper] [project] [ ICML 2023 ] FlexGen: High-throughput Generative Inference of Large Language Models with a Single GPU [paper] [project]