model-serving

There are 141 repositories under model-serving topic.

  • vllm-project/vllm

    A high-throughput and memory-efficient inference and serving engine for LLMs

    Language:Python30.6k2485.3k4.6k
  • BentoML

    bentoml/BentoML

    The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!

    Language:Python7.2k771.1k792
  • ahkarami/Deep-Learning-in-Production

    In this repository, I will share some useful notes and references about deploying deep learning-based models in production.

  • FedML-AI/FedML

    FEDML - The unified and scalable ML library for large-scale distributed training, model serving, and federated learning. FEDML Launch, a cross-cloud scheduler, further enables running any AI jobs on any GPU cloud or on-premise cluster. Built on this library, TensorOpera AI (https://TensorOpera.ai) is your generative AI platform at scale.

    Language:Python4.2k118327785
  • kserve/kserve

    Standardized Serverless ML Inference Platform on Kubernetes

    Language:Python3.6k651.9k1.1k
  • HuaizhengZhang/AI-System-School

    🚀 Awesome System for Machine Learning ⚡️ AI System Papers and Industry Practice. ⚡️ System for Machine Learning, LLM (Large Language Model), GenAI (Generative AI). 🍻 OSDI, NSDI, SIGCOMM, SoCC, MLSys, etc. 🗃️ Llama3, Mistral, etc. 🧑‍💻 Video Tutorials.

  • ModelTC/lightllm

    LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

    Language:Python2.6k23185206
  • predibase/lorax

    Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs

    Language:Python2.2k33245144
  • tensorchord/envd

    🏕️ Reproducible development environment

    Language:Go2k22529160
  • microsoft/aici

    AICI: Prompts as (Wasm) Programs

    Language:Rust1.9k247578
  • mlrun/mlrun

    MLRun is an open source MLOps platform for quickly building and managing continuous ML applications across their lifecycle. MLRun integrates into your development and CI/CD environment and automates the delivery of production data, ML pipelines, and online applications.

    Language:Python1.4k28270254
  • logicalclocks/hopsworks

    Hopsworks - Data-Intensive AI platform with a Feature Store

    Language:Java1.2k3517144
  • truss

    basetenlabs/truss

    The simplest way to serve AI/ML models in production

    Language:Python9191912272
  • mosec

    mosecorg/mosec

    A high-performance ML model serving framework, offers dynamic batching and CPU/GPU pipelines to fully exploit your compute machine

    Language:Python7921210060
  • bentoml/Yatai

    Model Deployment at Scale on Kubernetes 🦄️

    Language:TypeScript7911911669
  • openvinotoolkit/model_server

    A scalable inference server for models optimized with OpenVINO™

    Language:C++67531164212
  • efeslab/Nanoflow

    A throughput-oriented high-performance serving framework for LLMs

    Language:Cuda63572126
  • underneathall/pinferencia

    Python + Inference - Model Deployment library in Python. Simplest model inference server ever.

    Language:Python564416787
  • alibaba/rtp-llm

    RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.

    Language:C++544129250
  • kitops

    jozu-ai/kitops

    Securely share and store AI/ML projects as OCI artifacts in your container registry.

    Language:Go5031315054
  • eightBEC/fastapi-ml-skeleton

    FastAPI Skeleton App to serve machine learning models production-ready.

    Language:Python3946283
  • Lightning-Universe/stable-diffusion-deploy

    Learn to serve Stable Diffusion models on cloud infrastructure at scale. This Lightning App shows load-balancing, orchestrating, pre-provisioning, dynamic batching, GPU-inference, micro-services working together via the Lightning Apps framework.

    Language:Python392195439
  • ServerlessLLM

    ServerlessLLM/ServerlessLLM

    Serverless LLM Serving for Everyone.

    Language:Python351146331
  • bentoml/BentoDiffusion

    BentoDiffusion: A collection of diffusion models served with BentoML

    Language:Python33712925
  • AI-Hypercomputer/JetStream

    JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).

    Language:Python235182731
  • chitra

    aniketmaurya/chitra

    A multi-functional library for full-stack Deep Learning. Simplifies Model Building, API development, and Model Deployment.

    Language:Python22463537
  • lightbend/kafka-with-akka-streams-kafka-streams-tutorial

    Code samples for the Lightbend tutorial on writing microservices with Akka Streams, Kafka Streams, and Kafka

    Language:Scala21231571
  • FederatedAI/FATE-Serving

    A scalable, high-performance serving system for federated learning models

    Language:Java139307377
  • spotify/zoltar

    Common library for serving TensorFlow, XGBoost and scikit-learn models in production.

    Language:Java139246733
  • allegroai/clearml-serving

    ClearML - Model-Serving Orchestration and Repository Solution

    Language:Python137115740
  • bentoml/gallery

    BentoML Example Projects 🎨

    Language:Python1346950
  • serving-pytorch-models

    alvarobartt/serving-pytorch-models

    Serving PyTorch models with TorchServe :fire:

    Language:Jupyter Notebook1011316
  • notAI-tech/fastDeploy

    Deploy DL/ ML inference pipelines with minimal extra code.

    Language:Python978617
  • FlinkML/flink-jpmml

    flink-jpmml is a fresh-made library for dynamic real time machine learning predictions built on top of PMML standard models and Apache Flink streaming engine

    Language:Scala96133330
  • Project-MONAI/monai-deploy-app-sdk

    MONAI Deploy App SDK offers a framework and associated tools to design, develop and verify AI-driven applications in the healthcare imaging domain.

    Language:Jupyter Notebook942422348