vllm

There are 158 repositories under vllm topic.

meta-llama/llama-cookbook
Welcome to the Llama Cookbook! This is your go to guide for Building with Llama: Getting started with Inference, Fine-Tuning, RAG. We also show you how to solve end to end problems using Llama model family and using them on various provider services
Language:Jupyter Notebook17.9k 192 4382.6k
xorbitsai/inference
Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.
Language:Python8.5k 58 2.5k741
OpenRLHF/OpenRLHF
An Easy-to-use, Scalable and High-performance RLHF Framework based on Ray (PPO & GRPO & REINFORCE++ & vLLM & Ray & Dynamic Sampling & Async Agentic RL)
Language:Python7.9k 44 706769
katanaml/sparrow
Structured data extraction and instruction calling with ML, LLM and Vision LLM
Language:Python5k 62 88500
xlite-dev/Awesome-LLM-Inference
📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉
Language:Python4.5k 132 8306
containers/ramalama
RamaLama is an open-source developer tool that simplifies the local serving of AI models from any source and facilitates their use for inference in production, all through the familiar language of containers.
Language:Python2.1k 32 427252
vllm-project/vllm-ascend
Community maintained hardware plugin for vLLM on Ascend
Language:Python1.1k 9 177434
bricks-cloud/BricksLLM
🔒 Enterprise-grade API gateway that helps you monitor and impose cost or rate limits per API key. Get fine-grained access control and monitoring per user, application, or environment. Supports OpenAI, Azure OpenAI, Anthropic, vLLM, and open-source LLMs.
Language:Go1.1k 8 4184
substratusai/kubeai
AI Inference Operator for Kubernetes. The easiest way to serve ML models in production. Supports VLMs, LLMs, embeddings, and speech-to-text.
Language:Go1.1k 11 184110
apconw/sanic-web
一个轻量级、支持全链路且易于二次开发的大模型应用项目(Large Model Data Assistant) 支持DeepSeek/Qwen3等大模型基于 Dify 、LangChain/LangGraph、Ollama&Vllm、Sanic 和 Text2SQL 📊 等技术构建的一站式大模型应用开发项目，采用 Vue3、TypeScript 和 Vite 5 打造现代UI。它支持通过 ECharts 📈 实现基于大模型的数据图形化问答，具备处理 CSV 文件 📂 表格问答的能力。同时，能方便对接第三方开源 RAG 系统检索系统 🌐等，以支持广泛的通用知识问答。
Language:JavaScript1k 8 20186
prometheus-eval/prometheus-eval
Evaluate your LLM's response with Prometheus and GPT4 💯
Language:Python986 3 3864
harleyszhang/llm_note
LLM notes, including model inference, transformer model structure, and llm framework code analysis notes.
Language:Python819 7 088
ModelCloud/GPTQModel
LLM model quantization (compression) toolkit with hw acceleration support for Nvidia CUDA, AMD ROCm, Intel XPU and Intel/AMD/Apple CPU via HF, vLLM, and SGLang.
Language:Python782 5 244111
jakobdylanc/llmcord
Make Discord your LLM frontend - Supports any OpenAI compatible API (Ollama, xAI, Gemini, OpenRouter and more)
Language:Python675 8 72152
varunvasudeva1/llm-server-docs
Documentation on setting up a local LLM server on Debian from scratch, using Ollama/llama.cpp/vLLM, Open WebUI, Kokoro FastAPI, and ComfyUI.
541 6 432
ModelTC/llmc
[EMNLP 2024 Industry Track] This is the official PyTorch implementation of "LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit".
Language:Python450 10 7451
microsoft/vidur
A large-scale simulation framework for LLM inference
Language:Python438 8 3281
varunshenoy/super-json-mode
Low latency JSON generation using LLMs ⚡️
Language:Jupyter Notebook399 3 514
runpod-workers/worker-vllm
The RunPod worker template for serving our large language model endpoints. Powered by vLLM.
Language:Python366 7 90203
chtmp223/topicGPT
TopicGPT: A Prompt-Based Framework for Topic Modeling (NAACL'24)
Language:Python345 6 1658
jasonacox/TinyLLM
Setup and run a local LLM and Chatbot using consumer grade hardware.
Language:JavaScript285 10 735
InftyAI/llmaz
☸️ Easy, advanced inference platform for large language models on Kubernetes. 🌟 Star to support our work!
Language:Go252 7 11142
shell-nlp/gpt_server
gpt_server是一个用于生产级部署LLMs、Embedding、Reranker、ASR、TTS、文生图、图片编辑和文生视频的开源框架。
Language:Python209 4 2715
HuiResearch/Fast-Spark-TTS
基于SparkTTS、OrpheusTTS等模型，提供高质量中文语音合成与声音克隆服务。
Language:Python20426
lucasjinreal/Namo-R1
A CPU Realtime VLM in 500M. Surpassed Moondream2 and SmolVLM. Training from scratch with ease.
Language:Python177 4 916
JackYFL/awesome-VLLMs
This repository collects papers on VLLM applications. We will update new papers irregularly.
166 3 014
NetEase-Media/grps
Deep Learning Deployment Framework: Supports tf/torch/trt/trtllm/vllm and other NN frameworks. Support dynamic batching, and streaming modes. It is dual-language compatible with Python and C++, offering scalability, extensibility, and high performance. It helps users quickly deploy models and provide services through HTTP/RPC interfaces.
Language:C++165 9 313
gotzmann/booster
Booster - open accelerator for LLM models. Better inference and debugging for AI hackers
Language:C++162 7 68
yoziru/nextjs-vllm-ui
Fully-featured, beautiful web interface for vLLM - built with NextJS.
Language:TypeScript152 2 625
nbasyl/DoRA
Official implementation of "DoRA: Weight-Decomposed Low-Rank Adaptation"
123 2 34
IDEA-Research/RexSeek
Referring any person or objects given a natural language description. Code base for RexSeek and HumanRef Benchmark
Language:Python108 2 47
ALucek/ppt2desc
Convert PowerPoint files into semantically rich text using vision language models
Language:Python98 3 012
Trainy-ai/llm-atc
Fine-tuning and serving LLMs on any cloud
Language:Python89 3 32
OpenCSGs/llm-inference
llm-inference is a platform for publishing and managing llm inference, providing a wide range of out-of-the-box features for model deployment, such as UI, RESTful API, auto-scaling, computing resource management, monitoring, and more.
Language:Python86 12 3416
asprenger/ray_vllm_inference
A simple service that integrates vLLM with Ray Serve for fast and scalable LLM serving.
Language:Python71 1 410
llmariner/llmariner
Extensible generative AI platform on Kubernetes with OpenAI-compatible APIs.
Language:Go64 2 187

vllm

meta-llama/llama-cookbook

xorbitsai/inference

OpenRLHF/OpenRLHF

katanaml/sparrow

xlite-dev/Awesome-LLM-Inference

containers/ramalama

vllm-project/vllm-ascend

bricks-cloud/BricksLLM

substratusai/kubeai

apconw/sanic-web

prometheus-eval/prometheus-eval

harleyszhang/llm_note

ModelCloud/GPTQModel

jakobdylanc/llmcord

varunvasudeva1/llm-server-docs

ModelTC/llmc

microsoft/vidur

varunshenoy/super-json-mode

runpod-workers/worker-vllm

chtmp223/topicGPT

jasonacox/TinyLLM

InftyAI/llmaz

shell-nlp/gpt_server

HuiResearch/Fast-Spark-TTS

lucasjinreal/Namo-R1

JackYFL/awesome-VLLMs

NetEase-Media/grps

gotzmann/booster

yoziru/nextjs-vllm-ui

nbasyl/DoRA

IDEA-Research/RexSeek

ALucek/ppt2desc

Trainy-ai/llm-atc

OpenCSGs/llm-inference

asprenger/ray_vllm_inference

llmariner/llmariner