Pinned Repositories
aibrix
Cost-efficient and pluggable Infrastructure components for GenAI inference
flash-attention
Fast and memory-efficient exact attention
guidellm
Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs
llm-compressor
Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
production-stack
vLLM’s reference system for K8S-native cluster-wide deployment with community-driven performance optimization
recipes
Common recipes to run vLLM
semantic-router
Intelligent Mixture-of-Models Router for Efficient LLM Inference
speculators
A unified library for building, evaluating, and storing speculative decoding algorithms for LLM inference in vLLM
vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
vllm-ascend
Community maintained hardware plugin for vLLM on Ascend
vLLM's Repositories
vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
vllm-project/aibrix
Cost-efficient and pluggable Infrastructure components for GenAI inference
vllm-project/llm-compressor
Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
vllm-project/production-stack
vLLM’s reference system for K8S-native cluster-wide deployment with community-driven performance optimization
vllm-project/semantic-router
Intelligent Mixture-of-Models Router for Efficient LLM Inference
vllm-project/vllm-ascend
Community maintained hardware plugin for vLLM on Ascend
vllm-project/guidellm
Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs
vllm-project/recipes
Common recipes to run vLLM
vllm-project/flash-attention
Fast and memory-efficient exact attention
vllm-project/speculators
A unified library for building, evaluating, and storing speculative decoding algorithms for LLM inference in vLLM
vllm-project/dashboard
vLLM performance dashboard
vllm-project/vllm-spyre
Community maintained hardware plugin for vLLM on Spyre
vllm-project/vllm-openvino
vllm-project/ci-infra
This repo hosts code for vLLM CI & Performance Benchmark infrastructure.
vllm-project/vllm-project.github.io
vllm-project/vllm-nccl
Manages vllm-nccl dependency
vllm-project/vllm-gaudi
Community maintained hardware plugin for vLLM on Intel Gaudi
vllm-project/vllm-project.github.io-static
vllm-project/vllm-xpu-kernels
The vLLM XPU kernels for Intel GPU
vllm-project/FlashMLA
vllm-project/media-kit
vLLM Logo Assets
vllm-project/rfcs