vllm

There are 158 repositories under vllm topic.

  • meta-llama/llama-cookbook

    Welcome to the Llama Cookbook! This is your go to guide for Building with Llama: Getting started with Inference, Fine-Tuning, RAG. We also show you how to solve end to end problems using Llama model family and using them on various provider services

    Language:Jupyter Notebook17.9k1924382.6k
  • xorbitsai/inference

    Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.

    Language:Python8.5k582.5k741
  • OpenRLHF/OpenRLHF

    An Easy-to-use, Scalable and High-performance RLHF Framework based on Ray (PPO & GRPO & REINFORCE++ & vLLM & Ray & Dynamic Sampling & Async Agentic RL)

    Language:Python7.9k44706769
  • katanaml/sparrow

    Structured data extraction and instruction calling with ML, LLM and Vision LLM

    Language:Python5k6288500
  • Awesome-LLM-Inference

    xlite-dev/Awesome-LLM-Inference

    📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉

    Language:Python4.5k1328306
  • containers/ramalama

    RamaLama is an open-source developer tool that simplifies the local serving of AI models from any source and facilitates their use for inference in production, all through the familiar language of containers.

    Language:Python2.1k32427252
  • vllm-project/vllm-ascend

    Community maintained hardware plugin for vLLM on Ascend

    Language:Python1.1k9177434
  • bricks-cloud/BricksLLM

    🔒 Enterprise-grade API gateway that helps you monitor and impose cost or rate limits per API key. Get fine-grained access control and monitoring per user, application, or environment. Supports OpenAI, Azure OpenAI, Anthropic, vLLM, and open-source LLMs.

    Language:Go1.1k84184
  • substratusai/kubeai

    AI Inference Operator for Kubernetes. The easiest way to serve ML models in production. Supports VLMs, LLMs, embeddings, and speech-to-text.

    Language:Go1.1k11184110
  • apconw/sanic-web

    一个轻量级、支持全链路且易于二次开发的大模型应用项目(Large Model Data Assistant) 支持DeepSeek/Qwen3等大模型 基于 Dify 、LangChain/LangGraph、Ollama&Vllm、Sanic 和 Text2SQL 📊 等技术构建的一站式大模型应用开发项目,采用 Vue3、TypeScript 和 Vite 5 打造现代UI。它支持通过 ECharts 📈 实现基于大模型的数据图形化问答,具备处理 CSV 文件 📂 表格问答的能力。同时,能方便对接第三方开源 RAG 系统 检索系统 🌐等,以支持广泛的通用知识问答。

    Language:JavaScript1k820186
  • prometheus-eval/prometheus-eval

    Evaluate your LLM's response with Prometheus and GPT4 💯

    Language:Python98633864
  • harleyszhang/llm_note

    LLM notes, including model inference, transformer model structure, and llm framework code analysis notes.

    Language:Python8197088
  • ModelCloud/GPTQModel

    LLM model quantization (compression) toolkit with hw acceleration support for Nvidia CUDA, AMD ROCm, Intel XPU and Intel/AMD/Apple CPU via HF, vLLM, and SGLang.

    Language:Python7825244111
  • jakobdylanc/llmcord

    Make Discord your LLM frontend - Supports any OpenAI compatible API (Ollama, xAI, Gemini, OpenRouter and more)

    Language:Python675872152
  • varunvasudeva1/llm-server-docs

    Documentation on setting up a local LLM server on Debian from scratch, using Ollama/llama.cpp/vLLM, Open WebUI, Kokoro FastAPI, and ComfyUI.

  • ModelTC/llmc

    [EMNLP 2024 Industry Track] This is the official PyTorch implementation of "LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit".

    Language:Python450107451
  • microsoft/vidur

    A large-scale simulation framework for LLM inference

    Language:Python43883281
  • varunshenoy/super-json-mode

    Low latency JSON generation using LLMs ⚡️

    Language:Jupyter Notebook3993514
  • runpod-workers/worker-vllm

    The RunPod worker template for serving our large language model endpoints. Powered by vLLM.

    Language:Python366790203
  • chtmp223/topicGPT

    TopicGPT: A Prompt-Based Framework for Topic Modeling (NAACL'24)

    Language:Python34561658
  • jasonacox/TinyLLM

    Setup and run a local LLM and Chatbot using consumer grade hardware.

    Language:JavaScript28510735
  • InftyAI/llmaz

    ☸️ Easy, advanced inference platform for large language models on Kubernetes. 🌟 Star to support our work!

    Language:Go252711142
  • shell-nlp/gpt_server

    gpt_server是一个用于生产级部署LLMs、Embedding、Reranker、ASR、TTS、文生图、图片编辑和文生视频的开源框架。

    Language:Python20942715
  • HuiResearch/Fast-Spark-TTS

    基于SparkTTS、OrpheusTTS等模型,提供高质量中文语音合成与声音克隆服务。

    Language:Python20426
  • lucasjinreal/Namo-R1

    A CPU Realtime VLM in 500M. Surpassed Moondream2 and SmolVLM. Training from scratch with ease.

    Language:Python1774916
  • JackYFL/awesome-VLLMs

    This repository collects papers on VLLM applications. We will update new papers irregularly.

  • NetEase-Media/grps

    Deep Learning Deployment Framework: Supports tf/torch/trt/trtllm/vllm and other NN frameworks. Support dynamic batching, and streaming modes. It is dual-language compatible with Python and C++, offering scalability, extensibility, and high performance. It helps users quickly deploy models and provide services through HTTP/RPC interfaces.

    Language:C++1659313
  • gotzmann/booster

    Booster - open accelerator for LLM models. Better inference and debugging for AI hackers

    Language:C++162768
  • yoziru/nextjs-vllm-ui

    Fully-featured, beautiful web interface for vLLM - built with NextJS.

    Language:TypeScript1522625
  • nbasyl/DoRA

    Official implementation of "DoRA: Weight-Decomposed Low-Rank Adaptation"

  • IDEA-Research/RexSeek

    Referring any person or objects given a natural language description. Code base for RexSeek and HumanRef Benchmark

    Language:Python108247
  • ALucek/ppt2desc

    Convert PowerPoint files into semantically rich text using vision language models

    Language:Python983012
  • Trainy-ai/llm-atc

    Fine-tuning and serving LLMs on any cloud

    Language:Python89332
  • OpenCSGs/llm-inference

    llm-inference is a platform for publishing and managing llm inference, providing a wide range of out-of-the-box features for model deployment, such as UI, RESTful API, auto-scaling, computing resource management, monitoring, and more.

    Language:Python86123416
  • asprenger/ray_vllm_inference

    A simple service that integrates vLLM with Ray Serve for fast and scalable LLM serving.

    Language:Python711410
  • llmariner/llmariner

    Extensible generative AI platform on Kubernetes with OpenAI-compatible APIs.

    Language:Go642187