inference-server
There are 40 repositories under inference-server topic.
roboflow/inference
A fast, easy-to-use, production-ready inference server for computer vision supporting deployment of many popular model architectures and fine-tuned models.
basetenlabs/truss
The simplest way to serve AI/ML models in production
pipeless-ai/pipeless
An open-source computer vision framework to build and deploy apps in minutes
underneathall/pinferencia
Python + Inference - Model Deployment library in Python. Simplest model inference server ever.
NVIDIA/gpu-rest-engine
A REST API for Caffe using Docker and Go
BMW-InnovationLab/BMW-YOLOv4-Inference-API-GPU
This is a repository for an nocode object detection inference API using the Yolov3 and Yolov4 Darknet framework.
BMW-InnovationLab/BMW-YOLOv4-Inference-API-CPU
This is a repository for an nocode object detection inference API using the Yolov4 and Yolov3 Opencv.
BMW-InnovationLab/BMW-TensorFlow-Inference-API-CPU
This is a repository for an object detection inference API using the Tensorflow framework.
containers/podman-desktop-extension-ai-lab
Work with LLMs on a local environment using containers
vertexclique/orkhon
Orkhon: ML Inference Framework and Server Runtime
autodeployai/ai-serving
Serving AI/ML models in the open standard formats PMML and ONNX with both HTTP (REST API) and gRPC endpoints
kibae/onnxruntime-server
ONNX Runtime Server: The ONNX Runtime Server is a server that provides TCP and HTTP/HTTPS REST APIs for ONNX inference.
kf5i/k3ai
K3ai is a lightweight, fully automated, AI infrastructure-in-a-box solution that allows anyone to experiment quickly with Kubeflow pipelines. K3ai is perfect for anything from Edge to laptops.
notAI-tech/fastDeploy
Deploy DL/ ML inference pipelines with minimal extra code.
RubixML/Server
A standalone inference server for trained Rubix ML estimators.
curtisgray/wingman
Wingman is the fastest and easiest way to run Llama models on your PC or Mac.
friendliai/friendli-client
Friendli: the fastest serving engine for generative AI
k9ele7en/Triton-TensorRT-Inference-CRAFT-pytorch
Advanced inference pipeline using NVIDIA Triton Inference Server for CRAFT Text detection (Pytorch), included converter from Pytorch -> ONNX -> TensorRT, Inference pipelines (TensorRT, Triton server - multi-format). Supported model format for Triton inference: TensorRT engine, Torchscript, ONNX
haicheviet/fullstack-machine-learning-inference
Fullstack machine learning inference template
tensorchord/inference-benchmark
Benchmark for machine learning model online serving (LLM, embedding, Stable-Diffusion, Whisper)
leimao/Simple-Inference-Server
Inference Server Implementation from Scratch for Machine Learning Models
csy1204/TripBigs_Web
Session Based Real-time Hotel Recommendation Web Application
pandruszkow/whisper-inference-server
A networked inference server for Whisper so you don't have to keep waiting for the audio model to reload for the x-hunderdth time.
roboflow/inference-dashboard-example
Roboflow's inference server to analyze video streams. This project extracts insights from video frames at defined intervals and generates informative visualizations and CSV outputs.
RedisVentures/loan-prediction-microservice
An example of using Redis + RedisAI for a microservice that predicts consumer loan probabilities using Redis as a feature and model store and RedisAI as an inference server.
tensorchord/modelz-docs
Modelz is a developer-first platform for prototyping and deploying machine learning models.
geniusrise/vision
Vision and vision-multi-modal components for geniusrise framework
dlzou/computron
Serving distributed deep learning models with model parallel swapping.
geniusrise/text
Text components powering LLMs & SLMs for geniusrise framework
StefanoLusardi/tiny_inference_engine
Client/Server system to perform distributed inference on high load systems.
SABER-labs/torch_batcher
Serve pytorch inference requests using batching with redis for faster performance.
geniusrise/audio
Audio components for geniusrise framework
koseemre/mlplatform
Basic MLPlatform includes Model Registry and Inference Server
nikhiltadikonda/Kushagra-AI
An AI-powered mobile crop advisory app for farmers, gardeners that can provide information about crops using an image taken by the user. This supports 10 crops and 37 kinds of crop diseases. The AI model is a ResNet network that has been fine-tuned using crop images that were collected by web-scraping from Google Images and Plant-Village Dataset.
xdevfaheem/TGS
Effortlessly Deploy and Serve Large Language Models in the Cloud as an API Endpoint for Inference