This is an example to deploy image classification model using libasyik, a web services that runs on C++ and Boost libraries. Usually, Flask or Ray are used to deploy machine learning model as a web service endpoint. This repository show how to use deploy machine learning model using C++ and expose the endpoint for use.
This example also shows how to integrate Triton Inference Server using its C++ API Client. We can coupled the web services with onnxruntime to process the incoming data inside the proces. But I want to try the decoupled method to split the model inference process and web services that accept incoming data using REST API.
// Build from Dockerfile
docker build -t cpp-ml-server:1.2.0-tris . --build-arg ENGINE_TYPE=triton
docker build -t cpp-ml-server:1.2.0-ort . --build-arg ENGINE_TYPE=onnxrt
docker build -t cpp-ml-server:1.2.0-all . --build-arg ENGINE_TYPE=all
// Pull from Docker Registry
docker pull haritsahm/cpp-ml-server:1.2.0-all
docker pull haritsahm/cpp-ml-server:1.2.0-tris
docker pull haritsahm/cpp-ml-server:1.2.0-ort
- Sync submodules that have the model configurations
git submodule update --init --recursive
- Git LFS on submodule
// Just in case the model isn't downloadade
cd triton-ml-server && git lfs pull && cd ..
- Docker compose up
# Choose either to run the onnxrt engine or triton engine from the override commands
docker-compose up -d
import base64
import json
import cv2
import numpy as np
import requests
from PIL import Image
# Tree frog
url = "https://github.com/EliSchwartz/imagenet-sample-images/blob/master/n01644373_tree_frog.JPEG?raw=true"
image = np.array(Image.open(requests.get(url, stream=True).raw))
image_string = base64.b64encode(cv2.imencode('.png', image)[1]).decode('utf-8')
response = requests.post("http://127.0.0.1:8080/classification/image", headers={"Content-Type":"application/json"}, data=json.dumps({"image":image_string}))
print(json.loads(response.text))
- Add detailed data validation steps
- Optimize variables and parameters using pointers
- Support batched inputs
- Support coupled inference process using onnxruntime