model-serving
There are 129 repositories under model-serving topic.
vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
bentoml/BentoML
The easiest way to serve AI/ML models in production - Build Model Inference Service, LLM APIs, Multi-model Inference Graph/Pipelines, LLM/RAG apps, and more!
ahkarami/Deep-Learning-in-Production
In this repository, I will share some useful notes and references about deploying deep learning-based models in production.
FedML-AI/FedML
FEDML - The unified and scalable ML library for large-scale distributed training, model serving, and federated learning. FEDML Launch, a cross-cloud scheduler, further enables running any AI jobs on any GPU cloud or on-premise cluster. Built on this library, TensorOpera AI (https://TensorOpera.ai) is your generative AI platform at scale.
kserve/kserve
Standardized Serverless ML Inference Platform on Kubernetes
tensorchord/envd
🏕️ Reproducible development environment
ModelTC/lightllm
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
microsoft/aici
AICI: Prompts as (Wasm) Programs
predibase/lorax
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
mlrun/mlrun
MLRun is an open source MLOps platform for quickly building and managing continuous ML applications across their lifecycle. MLRun integrates into your development and CI/CD environment and automates the delivery of production data, ML pipelines, and online applications.
logicalclocks/hopsworks
Hopsworks - Data-Intensive AI platform with a Feature Store
basetenlabs/truss
The simplest way to serve AI/ML models in production
bentoml/Yatai
Model Deployment at Scale on Kubernetes 🦄️
mosecorg/mosec
A high-performance ML model serving framework, offers dynamic batching and CPU/GPU pipelines to fully exploit your compute machine
openvinotoolkit/model_server
A scalable inference server for models optimized with OpenVINO™
underneathall/pinferencia
Python + Inference - Model Deployment library in Python. Simplest model inference server ever.
alibaba/rtp-llm
RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.
Lightning-Universe/stable-diffusion-deploy
Learn to serve Stable Diffusion models on cloud infrastructure at scale. This Lightning App shows load-balancing, orchestrating, pre-provisioning, dynamic batching, GPU-inference, micro-services working together via the Lightning Apps framework.
eightBEC/fastapi-ml-skeleton
FastAPI Skeleton App to serve machine learning models production-ready.
bentoml/OneDiffusion
OneDiffusion: Run any Stable Diffusion models and fine-tuned weights with ease
aniketmaurya/chitra
A multi-functional library for full-stack Deep Learning. Simplifies Model Building, API development, and Model Deployment.
lightbend/kafka-with-akka-streams-kafka-streams-tutorial
Code samples for the Lightbend tutorial on writing microservices with Akka Streams, Kafka Streams, and Kafka
jozu-ai/kitops
Tools for easing the handoff between AI/ML and App/SRE teams.
google/JetStream
JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).
spotify/zoltar
Common library for serving TensorFlow, XGBoost and scikit-learn models in production.
FederatedAI/FATE-Serving
A scalable, high-performance serving system for federated learning models
bentoml/gallery
BentoML Example Projects 🎨
allegroai/clearml-serving
ClearML - Model-Serving Orchestration and Repository Solution
alvarobartt/serving-pytorch-models
Serving PyTorch models with TorchServe :fire:
FlinkML/flink-jpmml
flink-jpmml is a fresh-made library for dynamic real time machine learning predictions built on top of PMML standard models and Apache Flink streaming engine
notAI-tech/fastDeploy
Deploy DL/ ML inference pipelines with minimal extra code.
NimbleBoxAI/nbox
The official python package for NimbleBox. Exposes all APIs as CLIs and contains modules to make ML 🌸
Project-MONAI/monai-deploy-app-sdk
MONAI Deploy App SDK offers a framework and associated tools to design, develop and verify AI-driven applications in the healthcare imaging domain.
EmbeddedLLM/vllm-rocm
vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs
aporia-ai/inferencedb
🚀 Stream inferences of real-time ML models in production to any data lake (Experimental)