inference-server
There are 48 repositories under inference-server topic.
containers/ramalama
RamaLama is an open-source developer tool that simplifies the local serving of AI models from any source and facilitates their use for inference in production, all through the familiar language of containers.
roboflow/inference
Turn any computer or edge device into a command center for your computer vision projects.
Michael-A-Kuykendall/shimmy
⚡ Python-free Rust inference server — OpenAI-API compatible. GGUF + SafeTensors, hot model swap, auto-discovery, single binary. FREE now, FREE forever.
basetenlabs/truss
The simplest way to serve AI/ML models in production
pipeless-ai/pipeless
An open-source computer vision framework to build and deploy apps in minutes
underneathall/pinferencia
Python + Inference - Model Deployment library in Python. Simplest model inference server ever.
NVIDIA/gpu-rest-engine
A REST API for Caffe using Docker and Go
BMW-InnovationLab/BMW-YOLOv4-Inference-API-GPU
This is a repository for an nocode object detection inference API using the Yolov3 and Yolov4 Darknet framework.
containers/podman-desktop-extension-ai-lab
Work with LLMs on a local environment using containers
BMW-InnovationLab/BMW-YOLOv4-Inference-API-CPU
This is a repository for an nocode object detection inference API using the Yolov4 and Yolov3 Opencv.
BMW-InnovationLab/BMW-TensorFlow-Inference-API-CPU
This is a repository for an object detection inference API using the Tensorflow framework.
kibae/onnxruntime-server
ONNX Runtime Server: The ONNX Runtime Server is a server that provides TCP and HTTP/HTTPS REST APIs for ONNX inference.
autodeployai/ai-serving
Serving AI/ML models in the open standard formats PMML and ONNX with both HTTP (REST API) and gRPC endpoints
vertexclique/orkhon
Orkhon: ML Inference Framework and Server Runtime
kf5i/k3ai
K3ai is a lightweight, fully automated, AI infrastructure-in-a-box solution that allows anyone to experiment quickly with Kubeflow pipelines. K3ai is perfect for anything from Edge to laptops.
notAI-tech/fastDeploy
Deploy DL/ ML inference pipelines with minimal extra code.
RubixML/Server
A standalone inference server for trained Rubix ML estimators.
friendliai/friendli-client
[⛔️ DEPRECATED] Friendli: the fastest serving engine for generative AI
curtisgray/wingman
Wingman is the fastest and easiest way to run Llama models on your PC or Mac.
k9ele7en/Triton-TensorRT-Inference-CRAFT-pytorch
Advanced inference pipeline using NVIDIA Triton Inference Server for CRAFT Text detection (Pytorch), included converter from Pytorch -> ONNX -> TensorRT, Inference pipelines (TensorRT, Triton server - multi-format). Supported model format for Triton inference: TensorRT engine, Torchscript, ONNX
haicheviet/fullstack-machine-learning-inference
Fullstack machine learning inference template
tensorchord/inference-benchmark
Benchmark for machine learning model online serving (LLM, embedding, Stable-Diffusion, Whisper)
leimao/Simple-Inference-Server
Inference Server Implementation from Scratch for Machine Learning Models
TommyLemon/CVAuto
👁 零代码零标注 CV AI 自动化测试工具 🚀 免除大量人工画框和打标签等,直接零代码快速自动化测试 CV 计算机视觉 AI 人工智能图像识别算法:行人检测、动植物分类、人脸识别、OCR 车牌识别、旋转校正、舞蹈姿态、抠图分割 等,还可一键 下载测试报告、导出训练和测试数据集
roboflow/inference-dashboard-example
Roboflow's inference server to analyze video streams. This project extracts insights from video frames at defined intervals and generates informative visualizations and CSV outputs.
csy1204/TripBigs_Web
Session Based Real-time Hotel Recommendation Web Application
NGLSG/UniAPI
The Universal LLM Gateway - Integrate ANY AI Model with One Consistent API
pandruszkow/whisper-inference-server
A networked inference server for Whisper speech recognition
geniusrise/vision
Vision and vision-multi-modal components for geniusrise framework
redis-applied-ai/loan-prediction-microservice
An example of using Redis + RedisAI for a microservice that predicts consumer loan probabilities using Redis as a feature and model store and RedisAI as an inference server.
tensorchord/modelz-docs
Modelz is a developer-first platform for prototyping and deploying machine learning models.
dlzou/computron
Serving distributed deep learning models with model parallel swapping.
geniusrise/text
Text components powering LLMs & SLMs for geniusrise framework
goamegah/how-to-serve-models
Different ways of implementing an API to serve an image classification model
SABER-labs/torch_batcher
Serve pytorch inference requests using batching with redis for faster performance.
StefanoLusardi/tiny_inference_engine
Client/Server system to perform distributed inference on high load systems.