Awesome LLM Systems Papers

A curated list of FL systems-related academic papers, articles, tutorials, slides and projects. Star this repository, and then you can keep abreast of the latest developments of this booming research field.

LLM Systems

Orca: A Distributed Serving System for Transformer-Based Generative Models | OSDI 22
FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance | Stanford
Fast Distributed Inference Serving for Large Language Models | Peking University
Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline | NUS
Efficiently Scaling Transformer Inference | MLSys' 23
Flover: A Temporal Fusion Framework for Efficient Autoregressive Model Parallel Inference
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM
Reducing Activation Recomputation in Large Transformer Models
DeepSpeed Inference : Enabling Efficient Inference of Transformer Models at Unprecedented Scale.
FlexGen: High-throughput Generative Inference of Large Language Models with a Single GPU | UCB
S3: Increasing GPU Utilization during Generative Inference for Higher Throughput
Scissorhands: Exploiting the Persistence of Importance Hypothesis for LLM KV Cache Compression at Test Time
AttMemo: Accelerating Self-Attention with Memoization on Big Memory Systems
vLLM: Easy, Fast, and Cheap LLM Serving with PagedAttention | SOSP' 23
Tabi: An Efficient Multi-Level Inference System for Large Language Models | EuroSys' 23
TurboTransformers: An Efficient GPU Serving System For Transformer Models
Inference with Reference: Lossless Acceleration of Large Language Models
H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models
SkipDecode: Autoregressive Skip Decoding with Batching and Caching for Efficient LLM Inferencex
Full Stack Optimization of Transformer Inference: a Survey
Optimized Network Architectures for Large Language Model Training with Billions of Parameters | UCB
MPCFormer : fast, performant, and private transformer inference with MPC | ICLR'23

ML Systems

INFaaS: Automated Model-less Inference Serving | ATC’ 21
Alpa : Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning | OSDI' 22
Pathways : Asynchronous Distributed Dataflow for ML | MLSys' 22
AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving
DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale ICML' 2022.
ZeRO-Offload : Democratizing Billion-Scale Model Training.
ZeRO-Infinity : Breaking the GPU Memory Wall for Extreme Scale Deep Learning
ZeRO : memory optimizations toward training trillion parameter models.
Band: Coordinated Multi-DNN Inference on Heterogeneous Mobile Processors | MobiSys ’22
Serving Heterogeneous Machine Learning Models on Multi-GPU Servers with Spatio-Temporal Sharing | ATC'22
Fast and Efficient Model Serving Using Multi-GPUs with Direct-Host-Access | Eurosys'23
Cocktail: A Multidimensional Optimization for Model Serving in Cloud | NSDI'22
Merak: An Efficient Distributed DNN Training Framework with Automated 3D Parallelism for Giant Foundation Models
SHEPHERD : Serving DNNs in the Wild
Efficient GPU Kernels for N:M-Sparse Weights in Deep Learning
AutoScratch: ML-Optimized Cache Management for Inference-Oriented GPUs
ZeRO++: Extremely Efficient Collective Communication for Giant Model Training
Channel Permutations for N:M Sparsity | MLSys' 23
Welder : Scheduling Deep Learning Memory Access via Tile-graph | OSDI' 23
Optimizing Dynamic Neural Networks with Brainstorm | OSDI'23
ModelKeeper: Accelerating DNN Training via Automated Training Warmup | NSDI'23

LLM Benchmark / Leaderboard

LLM Energy Leaderboard | Umich
Aviary Explorer | Anyscale
Open LLM Leaderboard | HuggingFace
HELM | Stanford
LMSYS | UCB

wxbbuaa2011/LLMSys-PaperList

Awesome LLM Systems Papers

LLM Systems

ML Systems

LLM Benchmark / Leaderboard

Other list

Related Readings