Awesome-DB4LLM

Inference system

  • [ SOSP 2023 ] Efficient Memory Management for Large Language Model Serving with PagedAttention (vllm) [paper] [project]
  • [ ICML 2023 ] FlexGen: High-throughput Generative Inference of Large Language Models with a Single GPU [paper] [project]