model-serving

There are 147 repositories under model-serving topic.

vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
Language:Python38.9k 320 6.7k5.8k
bentoml/BentoML
The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!
Language:Python7.4k 75 1.1k811
ahkarami/Deep-Learning-in-Production
In this repository, I will share some useful notes and references about deploying deep learning-based models in production.
4.3k 148 4687
kserve/kserve
Standardized Serverless ML Inference Platform on Kubernetes
Language:Python3.9k 69 1.9k1.1k
FedML-AI/FedML
FEDML - The unified and scalable ML library for large-scale distributed training, model serving, and federated learning. FEDML Launch, a cross-cloud scheduler, further enables running any AI jobs on any GPU cloud or on-premise cluster. Built on this library, TensorOpera AI (https://TensorOpera.ai) is your generative AI platform at scale.
Language:Python3.8k 94 327744
ModelTC/lightllm
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
Language:Python2.9k 23 193230
HuaizhengZhang/AI-System-School
🚀 Awesome System for Machine Learning ⚡️ AI System Papers and Industry Practice. ⚡️ System for Machine Learning, LLM (Large Language Model), GenAI (Generative AI). 🍻 OSDI, NSDI, SIGCOMM, SoCC, MLSys, etc. 🗃️ Llama3, Mistral, etc. 🧑‍💻 Video Tutorials.
2.8k 127 31321
predibase/lorax
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
Language:Python2.4k 31 259150
tensorchord/envd
🏕️ Reproducible development environment
Language:Go2.1k 22 532159
microsoft/aici
AICI: Prompts as (Wasm) Programs
Language:Rust2k 24 7682
beclab/Olares
Olares: An Open-Source Sovereign Cloud OS for Local AI
Language:Shell1.8k 19 3762
mlrun/mlrun
MLRun is an open source MLOps platform for quickly building and managing continuous ML applications across their lifecycle. MLRun integrates into your development and CI/CD environment and automates the delivery of production data, ML pipelines, and online applications.
Language:Python1.5k 28 276257
logicalclocks/hopsworks
Hopsworks - Data-Intensive AI platform with a Feature Store
Language:Java1.2k 36 18148
basetenlabs/truss
The simplest way to serve AI/ML models in production
Language:Python951 18 12580
zhihu/ZhiLight
A highly optimized LLM inference acceleration engine for Llama and its variants.
Language:C++857 51 12102
mosecorg/mosec
A high-performance ML model serving framework, offers dynamic batching and CPU/GPU pipelines to fully exploit your compute machine
Language:Python822 13 10161
bentoml/Yatai
Model Deployment at Scale on Kubernetes 🦄️
Language:TypeScript792 18 11670
efeslab/Nanoflow
A throughput-oriented high-performance serving framework for LLMs
Language:Cuda739 8 2829
openvinotoolkit/model_server
A scalable inference server for models optimized with OpenVINO™
Language:C++708 31 173216
jozu-ai/kitops
An open source DevOps tool for packaging and versioning AI/ML models, datasets, code, and configuration into an OCI artifact.
Language:Go707 14 19168
alibaba/rtp-llm
RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.
Language:C++632 13 9654
underneathall/pinferencia
Python + Inference - Model Deployment library in Python. Simplest model inference server ever.
Language:Python557 41 6787
eightBEC/fastapi-ml-skeleton
FastAPI Skeleton App to serve machine learning models production-ready.
Language:Python434 6 384
ServerlessLLM/ServerlessLLM
Serverless LLM Serving for Everyone.
Language:Python422 12 8137
intel/xFasterTransformer
Language:C++402 17 9065
Lightning-Universe/stable-diffusion-deploy
Learn to serve Stable Diffusion models on cloud infrastructure at scale. This Lightning App shows load-balancing, orchestrating, pre-provisioning, dynamic batching, GPU-inference, micro-services working together via the Lightning Apps framework.
Language:Python393 19 5439
bentoml/BentoDiffusion
BentoDiffusion: A collection of diffusion models served with BentoML
Language:Python350 12 927
AI-Hypercomputer/JetStream
JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).
Language:Python276 20 2736
aniketmaurya/chitra
A multi-functional library for full-stack Deep Learning. Simplifies Model Building, API development, and Model Deployment.
Language:Python225 5 3537
lightbend/kafka-with-akka-streams-kafka-streams-tutorial
Code samples for the Lightbend tutorial on writing microservices with Akka Streams, Kafka Streams, and Kafka
Language:Scala211 31 571
vllm-project/vllm-ascend
Community maintained hardware plugin for vLLM on Ascend
Language:Python16530
clearml/clearml-serving
ClearML - Model-Serving Orchestration and Repository Solution
Language:Python143 11 5740
interestingLSY/swiftLLM
A tiny yet powerful LLM inference system tailored for researching purpose. vLLM-equivalent performance with only 2k lines of code (2% of vLLM).
Language:Python139 3 410
spotify/zoltar
Common library for serving TensorFlow, XGBoost and scikit-learn models in production.
Language:Java139 23 6733
FederatedAI/FATE-Serving
A scalable, high-performance serving system for federated learning models
Language:Java138 30 7478
bentoml/gallery
BentoML Example Projects 🎨
137 6 949

model-serving

vllm-project/vllm

bentoml/BentoML

ahkarami/Deep-Learning-in-Production

kserve/kserve

FedML-AI/FedML

ModelTC/lightllm

HuaizhengZhang/AI-System-School

predibase/lorax

tensorchord/envd

microsoft/aici

beclab/Olares

mlrun/mlrun

logicalclocks/hopsworks

basetenlabs/truss

zhihu/ZhiLight

mosecorg/mosec

bentoml/Yatai

efeslab/Nanoflow

openvinotoolkit/model_server

jozu-ai/kitops

alibaba/rtp-llm

underneathall/pinferencia

eightBEC/fastapi-ml-skeleton

ServerlessLLM/ServerlessLLM

intel/xFasterTransformer

Lightning-Universe/stable-diffusion-deploy

bentoml/BentoDiffusion

AI-Hypercomputer/JetStream

aniketmaurya/chitra

lightbend/kafka-with-akka-streams-kafka-streams-tutorial

vllm-project/vllm-ascend

clearml/clearml-serving

interestingLSY/swiftLLM

spotify/zoltar

FederatedAI/FATE-Serving

bentoml/gallery