inference-server

There are 44 repositories under inference-server topic.

roboflow/inference
A fast, easy-to-use, production-ready inference server for computer vision supporting deployment of many popular model architectures and fine-tuned models.
Language:Python1.4k 22 145137
basetenlabs/truss
The simplest way to serve AI/ML models in production
Language:Python933 19 12375
pipeless-ai/pipeless
An open-source computer vision framework to build and deploy apps in minutes
Language:Rust732 6 3837
underneathall/pinferencia
Python + Inference - Model Deployment library in Python. Simplest model inference server ever.
Language:Python565 41 6787
NVIDIA/gpu-rest-engine
A REST API for Caffe using Docker and Go
Language:C++421 44 4194
containers/ramalama
The goal of RamaLama is to make working with AI boring.
Language:Shell304 17 8151
BMW-InnovationLab/BMW-YOLOv4-Inference-API-GPU
This is a repository for an nocode object detection inference API using the Yolov3 and Yolov4 Darknet framework.
Language:Python281 17 469
BMW-InnovationLab/BMW-YOLOv4-Inference-API-CPU
This is a repository for an nocode object detection inference API using the Yolov4 and Yolov3 Opencv.
Language:Python220 15 159
containers/podman-desktop-extension-ai-lab
Work with LLMs on a local environment using containers
Language:TypeScript186 20 83641
BMW-InnovationLab/BMW-TensorFlow-Inference-API-CPU
This is a repository for an object detection inference API using the Tensorflow framework.
Language:Python185 14 248
autodeployai/ai-serving
Serving AI/ML models in the open standard formats PMML and ONNX with both HTTP (REST API) and gRPC endpoints
Language:Scala150 5 1331
vertexclique/orkhon
Orkhon: ML Inference Framework and Server Runtime
Language:Rust146 5 75
kibae/onnxruntime-server
ONNX Runtime Server: The ONNX Runtime Server is a server that provides TCP and HTTP/HTTPS REST APIs for ONNX inference.
Language:C++131 2 67
kf5i/k3ai
K3ai is a lightweight, fully automated, AI infrastructure-in-a-box solution that allows anyone to experiment quickly with Kubeflow pipelines. K3ai is perfect for anything from Edge to laptops.
Language:PowerShell101 9 010
notAI-tech/fastDeploy
Deploy DL/ ML inference pipelines with minimal extra code.
Language:Python96 8 617
RubixML/Server
A standalone inference server for trained Rubix ML estimators.
Language:PHP61 8 312
curtisgray/wingman
Wingman is the fastest and easiest way to run Llama models on your PC or Mac.
Language:TypeScript42 3 32
friendliai/friendli-client
Friendli: the fastest serving engine for generative AI
Language:Python42 4 18
k9ele7en/Triton-TensorRT-Inference-CRAFT-pytorch
Advanced inference pipeline using NVIDIA Triton Inference Server for CRAFT Text detection (Pytorch), included converter from Pytorch -> ONNX -> TensorRT, Inference pipelines (TensorRT, Triton server - multi-format). Supported model format for Triton inference: TensorRT engine, Torchscript, ONNX
Language:Python32 2 27
haicheviet/fullstack-machine-learning-inference
Fullstack machine learning inference template
Language:Jupyter Notebook30 3 012
tensorchord/inference-benchmark
Benchmark for machine learning model online serving (LLM, embedding, Stable-Diffusion, Whisper)
Language:Python27 6 33
leimao/Simple-Inference-Server
Inference Server Implementation from Scratch for Machine Learning Models
Language:Python23 4 01
csy1204/TripBigs_Web
Session Based Real-time Hotel Recommendation Web Application
Language:Python10 2 04
woodx9/tllm
create your own llm inference server from scratch
Language:Python100
pandruszkow/whisper-inference-server
A networked inference server for Whisper so you don't have to keep waiting for the audio model to reload for the x-hunderdth time.
Language:Python8 1 00
roboflow/inference-dashboard-example
Roboflow's inference server to analyze video streams. This project extracts insights from video frames at defined intervals and generates informative visualizations and CSV outputs.
Language:Python8 5 02
redis-applied-ai/loan-prediction-microservice
An example of using Redis + RedisAI for a microservice that predicts consumer loan probabilities using Redis as a feature and model store and RedisAI as an inference server.
Language:Jupyter Notebook7 4 0
tensorchord/modelz-docs
Modelz is a developer-first platform for prototyping and deploying machine learning models.
Language:MDX7 7 25
geniusrise/vision
Vision and vision-multi-modal components for geniusrise framework
Language:Python6 0 111
dlzou/computron
Serving distributed deep learning models with model parallel swapping.
Language:Jupyter Notebook5 1 01
geniusrise/text
Text components powering LLMs & SLMs for geniusrise framework
Language:Python5 1 92
StefanoLusardi/tiny_inference_engine
Client/Server system to perform distributed inference on high load systems.
Language:C++5 3 01
SABER-labs/torch_batcher
Serve pytorch inference requests using batching with redis for faster performance.
Language:Python4 1 00
ajinkyapuar/qis
Language:Python2 1 00
geniusrise/audio
Audio components for geniusrise framework
Language:Python2 0 131
koseemre/mlplatform
Basic MLPlatform includes Model Registry and Inference Server
Language:Python1 1 00

inference-server

roboflow/inference

basetenlabs/truss

pipeless-ai/pipeless

underneathall/pinferencia

NVIDIA/gpu-rest-engine

containers/ramalama

BMW-InnovationLab/BMW-YOLOv4-Inference-API-GPU

BMW-InnovationLab/BMW-YOLOv4-Inference-API-CPU

containers/podman-desktop-extension-ai-lab

BMW-InnovationLab/BMW-TensorFlow-Inference-API-CPU

autodeployai/ai-serving

vertexclique/orkhon

kibae/onnxruntime-server

kf5i/k3ai

notAI-tech/fastDeploy

RubixML/Server

curtisgray/wingman

friendliai/friendli-client

k9ele7en/Triton-TensorRT-Inference-CRAFT-pytorch

haicheviet/fullstack-machine-learning-inference

tensorchord/inference-benchmark

leimao/Simple-Inference-Server

csy1204/TripBigs_Web

woodx9/tllm

pandruszkow/whisper-inference-server

roboflow/inference-dashboard-example

redis-applied-ai/loan-prediction-microservice

tensorchord/modelz-docs

geniusrise/vision

dlzou/computron

geniusrise/text

StefanoLusardi/tiny_inference_engine

SABER-labs/torch_batcher

ajinkyapuar/qis

geniusrise/audio

koseemre/mlplatform