llm-serving

There are 85 repositories under llm-serving topic.

vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
Language:Python41.1k 345 7k6.2k
ray-project/ray
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
Language:Python35.9k 480 19.6k6.1k
liguodongiot/llm-action
本项目旨在分享大模型相关技术原理以及实战经验（大模型工程化、大模型应用落地）
Language:HTML15.2k 126 261.7k
sgl-project/sglang
SGLang is a fast serving framework for large language models and vision language models.
Language:Python11.7k 87 1.4k1.2k
bentoml/OpenLLM
Run any open-source LLMs, such as DeepSeek and Llama, as OpenAI compatible API endpoint in the cloud.
Language:Python10.9k 57 272691
skypilot-org/skypilot
SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 15+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.
Language:Python7.5k 70 2.1k589
bentoml/BentoML
The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!
Language:Python7.5k 74 1.1k820
superduper-io/superduper
Superduper: Build end-to-end AI applications and agent workflows on your existing data infrastructure and preferred tools - without migrating your data.
Language:Jupyter Notebook5k 44 1.3k489
predibase/lorax
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
Language:Python2.4k 31 262150
microsoft/aici
AICI: Prompts as (Wasm) Programs
Language:Rust2k 23 7683
MoonshotAI/MoBA
MoBA: Mixture of Block Attention for Long-Context LLMs
Language:Python1.6k 23 1594
ray-project/ray-llm
RayLLM - LLMs on Ray
Language:Python1.3k 19 8993
zhihu/ZhiLight
A highly optimized LLM inference acceleration engine for Llama and its variants.
Language:C++872 51 12102
mosecorg/mosec
A high-performance ML model serving framework, offers dynamic batching and CPU/GPU pipelines to fully exploit your compute machine
Language:Python827 13 10162
efeslab/Nanoflow
A throughput-oriented high-performance serving framework for LLMs
Language:Cuda751 8 2829
alibaba/rtp-llm
RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.
Language:C++656 13 9854
rohan-paul/LLM-FineTuning-Large-Language-Models
LLM (Large Language Model) FineTuning
Language:Jupyter Notebook512 9 2122
hpcaitech/SwiftInfer
Efficient AI Inference & Serving
Language:Python467 5 727
helixml/helix
🧬 Helix is a private GenAI stack for building AI applications with declarative pipelines, knowledge (RAG), API bindings, and first-class testing.
Language:Go451 6 22146
ray-project/ray-educational-materials
This is suite of the hands-on training materials that shows how to scale CV, NLP, time-series forecasting workloads with Ray.
Language:Jupyter Notebook378 9 2371
vllm-project/vllm-ascend
Community maintained hardware plugin for vLLM on Ascend
Language:Python309 7 9451
galeselee/Awesome_LLM_System-PaperList
Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of papers on accelerating LLMs, currently focusing mainly on inference acceleration, and related works will be gradually added in the future. Welcome contributions!
230 4 09
substratusai/runbooks
Finetune LLMs on K8s by using Runbooks
Language:Go170 7 12514
torchpipe/torchpipe
Serving Inside Pytorch
Language:C++155 5 713
HPMLL/BurstGPT
A ChatGPT(GPT-3.5) & GPT-4 Workload Trace to Optimize LLM Serving Systems
Language:Python148 6 59
chenhunghan/ialacol
🪶 Lightweight OpenAI drop-in replacement for Kubernetes
Language:Python144 3 2317
interestingLSY/swiftLLM
A tiny yet powerful LLM inference system tailored for researching purpose. vLLM-equivalent performance with only 2k lines of code (2% of vLLM).
Language:Python144 3 411
slai-labs/get-beam
Run GPU inference and training jobs on serverless infrastructure that scales with you.
Language:Shell102 2 723
powerserve-project/PowerServe
High-speed and easy-use LLM serving framework for local deployment
Language:C++91 5 36
mani-kantap/llm-inference-solutions
A collection of all available inference solutions for the LLMs
80 2 03
asprenger/ray_vllm_inference
A simple service that integrates vLLM with Ray Serve for fast and scalable LLM serving.
Language:Python63 1 48
azminewasi/Awesome-LLMs-ICLR-24
It is a comprehensive resource hub compiling all LLM papers accepted at the International Conference on Learning Representations (ICLR) in 2024.
53 1 03
sugarcane-ai/sugarcane-ai
npm like package ecosystem for Prompts 🤖
Language:TypeScript49 5 7914
bigai-nlco/TokenSwift
From Hours to Minutes: Lossless Acceleration of Ultra Long Sequence Generation
Language:Python484
AntonioGr7/pratical-llms
A collection of hand on notebook for LLMs practitioner
Language:Jupyter Notebook46 3 012
friendliai/friendli-client
Friendli: the fastest serving engine for generative AI
Language:Python42 4 28

llm-serving

vllm-project/vllm

ray-project/ray

liguodongiot/llm-action

sgl-project/sglang

bentoml/OpenLLM

skypilot-org/skypilot

bentoml/BentoML

superduper-io/superduper

predibase/lorax

microsoft/aici

MoonshotAI/MoBA

ray-project/ray-llm

zhihu/ZhiLight

mosecorg/mosec

efeslab/Nanoflow

alibaba/rtp-llm

rohan-paul/LLM-FineTuning-Large-Language-Models

hpcaitech/SwiftInfer

helixml/helix

ray-project/ray-educational-materials

vllm-project/vllm-ascend

galeselee/Awesome_LLM_System-PaperList

substratusai/runbooks

torchpipe/torchpipe

HPMLL/BurstGPT

chenhunghan/ialacol

interestingLSY/swiftLLM

slai-labs/get-beam

powerserve-project/PowerServe

mani-kantap/llm-inference-solutions

asprenger/ray_vllm_inference

azminewasi/Awesome-LLMs-ICLR-24

sugarcane-ai/sugarcane-ai

bigai-nlco/TokenSwift

AntonioGr7/pratical-llms

friendliai/friendli-client