Pinned Repositories
aibrix
Cost-efficient and pluggable Infrastructure components for GenAI inference
dashboard
vLLM performance dashboard
flash-attention
Fast and memory-efficient exact attention
llm-compressor
Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
production-stack
vLLM’s reference system for K8S-native cluster-wide deployment with community-driven performance optimization
vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
vllm-ascend
Community maintained hardware plugin for vLLM on Ascend
vllm-nccl
Manages vllm-nccl dependency
vllm-project.github.io-static
vllm-spyre
Community maintained hardware plugin for vLLM on Spyre
vLLM's Repositories
vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
vllm-project/aibrix
Cost-efficient and pluggable Infrastructure components for GenAI inference
vllm-project/llm-compressor
Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
vllm-project/production-stack
vLLM’s reference system for K8S-native cluster-wide deployment with community-driven performance optimization
vllm-project/vllm-ascend
Community maintained hardware plugin for vLLM on Ascend
vllm-project/flash-attention
Fast and memory-efficient exact attention
vllm-project/dashboard
vLLM performance dashboard
vllm-project/vllm-nccl
Manages vllm-nccl dependency
vllm-project/vllm-spyre
Community maintained hardware plugin for vLLM on Spyre
vllm-project/buildkite-ci
vllm-project/vllm-project.github.io-static
vllm-project/vllm-project.github.io
vllm-project/FlashMLA
vllm-project/media-kit
vLLM Logo Assets
vllm-project/vllm-openvino
vllm-project/vllm_allocator_adaptor
An adaptor to allow Python allocator for PyTorch pluggable allocator