serving

There are 122 repositories under serving topic.

ray-project/ray
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
Language:Python38.9k 491 21.1k6.8k
tensorflow/serving
A flexible, high-performance serving system for machine learning models
Language:C++6.3k 229 1.5k2.2k
volcano-sh/volcano
A Cloud Native Batch System (Project under CNCF)
Language:Go5k 85 1.7k1.2k
SeldonIO/seldon-core
An MLOps framework to package, deploy, monitor and manage thousands of production machine learning models
Language:Go4.6k 85 2.3k847
ahkarami/Deep-Learning-in-Production
In this repository, I will share some useful notes and references about deploying deep learning-based models in production.
4.4k 147 4697
pytorch/serve
Serve, optimize and scale PyTorch models in production
Language:Java4.3k 54 1.7k888
Lightning-AI/LitServe
The easiest way to deploy agents, MCP servers, models, RAG, pipelines and more. No MLOps. No YAML.
Language:Python3.6k 32 129240
PaddlePaddle/FastDeploy
High-performance Inference and Deployment Toolkit for LLMs and VLMs based on PaddlePaddle
Language:Python3.5k 54 1.3k622
skyzh/tiny-llm
A course of learning LLM inference serving on Apple Silicon for systems engineers: build a tiny vLLM + Qwen.
Language:Python3.2k201
georgia-tech-db/evadb
Database system for AI-powered apps
Language:Python2.7k 27 302263
tobegit3hub/tensorflow_template_application
TensorFlow template application for deep learning
Language:Python1.9k 183 40713
ray-project/llm-applications
A comprehensive guide to building RAG-based LLM applications for production.
Language:Jupyter Notebook1.8k 17 12251
dingodb/dingo
A multi-modal vector database that supports upserts and vector queries using unified SQL (MySQL-Compatible) on structured and unstructured data, while meeting the requirements of high concurrency and ultra-low latency.
Language:Java1.7k 160 75265
Delta-ML/delta
DELTA is a deep learning based natural language and speech processing platform. LF AI & DATA Projects: https://lfaidata.foundation/projects/delta/
Language:Python1.6k 64 75288
PaddlePaddle/Serving
A flexible, high-performance carrier for machine learning models（『飞桨』服务化部署框架）
Language:C++915 94 859251
openvinotoolkit/model_server
A scalable inference server for models optimized with OpenVINO™
Language:C++760 31 219221
tobegit3hub/simple_tensorflow_serving
Generic and easy-to-use serving service for machine learning models
Language:JavaScript760 29 76190
underneathall/pinferencia
Python + Inference - Model Deployment library in Python. Simplest model inference server ever.
Language:Python552 38 6783
meta-soul/MetaSpore
A unified end-to-end machine intelligence platform
Language:Python538 51 8393
vectorch-ai/ScaleLLM
A high-performance inference system for large language models, designed for production environments.
Language:C++465 16 9136
polyaxon/haupt
Lineage metadata API, artifacts streams, sandbox, API, and spaces for Polyaxon
Language:Python452 35 0210
zzsza/Boostcamp-AI-Tech-Product-Serving
부스트캠프 AI Tech - Product Serving 자료
Language:Python451 7 0368
bodywork-ml/bodywork-core
ML pipeline orchestration and model deployments on Kubernetes.
Language:Python434 10 7822
Hydrospheredata/hydro-serving
MLOps Platform
Language:Mustache272 23 14040
deepjavalibrary/djl-serving
A universal scalable machine learning model deployment solution
Language:Java234 11 13977
FasterDecoding/BitDelta
Language:Jupyter Notebook202 5 517
outcaste-io/outserv
Blockchain Search with GraphQL APIs
Language:Go198 9 1615
cap-ntu/ML-Model-CI
MLModelCI is a complete MLOps platform for managing, converting, profiling, and deploying MLaaS (Machine Learning-as-a-Service), bridging the gap between current ML training and serving systems.
Language:Python194 17 11733
NetEase-Media/grps
Deep Learning Deployment Framework: Supports tf/torch/trt/trtllm/vllm and other NN frameworks. Support dynamic batching, and streaming modes. It is dual-language compatible with Python and C++, offering scalability, extensibility, and high performance. It helps users quickly deploy models and provide services through HTTP/RPC interfaces.
Language:C++165 9 313
torchpipe/torchpipe
Serving Inside Pytorch
Language:C++163 5 713
clearml/clearml-serving
ClearML - Model-Serving Orchestration and Repository Solution
Language:Python156 10 5842
krystianity/keras-serving
bring keras-models to production with tensorflow-serving and nodejs + docker :pizza:
Language:Python152 15 726
emacski/tensorflow-serving-arm
TensorFlow Serving ARM - A project for cross-compiling TensorFlow Serving targeting popular ARM cores
Language:C++102 3 1416
notAI-tech/fastDeploy
Deploy DL/ ML inference pipelines with minimal extra code.
Language:Python99 7 617
AI-Hypercomputer/gpu-recipes
Recipes for reproducing training and serving benchmarks for large machine learning models using GPUs on Google Cloud.
Language:Python84 11 412
balavenkatesh3322/model_deployment
A collection of model deployment library and technique.
72 2 09

serving

ray-project/ray

tensorflow/serving

volcano-sh/volcano

SeldonIO/seldon-core

ahkarami/Deep-Learning-in-Production

pytorch/serve

Lightning-AI/LitServe

PaddlePaddle/FastDeploy

skyzh/tiny-llm

georgia-tech-db/evadb

tobegit3hub/tensorflow_template_application

ray-project/llm-applications

dingodb/dingo

Delta-ML/delta

PaddlePaddle/Serving

openvinotoolkit/model_server

tobegit3hub/simple_tensorflow_serving

underneathall/pinferencia

meta-soul/MetaSpore

vectorch-ai/ScaleLLM

polyaxon/haupt

zzsza/Boostcamp-AI-Tech-Product-Serving

bodywork-ml/bodywork-core

Hydrospheredata/hydro-serving

deepjavalibrary/djl-serving

FasterDecoding/BitDelta

outcaste-io/outserv

cap-ntu/ML-Model-CI

NetEase-Media/grps

torchpipe/torchpipe

clearml/clearml-serving

krystianity/keras-serving

emacski/tensorflow-serving-arm

notAI-tech/fastDeploy

AI-Hypercomputer/gpu-recipes

balavenkatesh3322/model_deployment