OpenVINO™ Model Server (OVMS) is a scalable, high-performance solution for serving machine learning models optimized for Intel® architectures. The server provides an inference service via gRPC or REST API - making it easy to deploy new algorithms and AI experiments using the same architecture as TensorFlow Serving for any models trained in a framework that is supported by OpenVINO.
The server implements gRPC and REST API framework with data serialization and deserialization using TensorFlow Serving API, and OpenVINO™ as the inference execution provider. Model repositories may reside on a locally accessible file system (e.g. NFS), Google Cloud Storage (GCS), Amazon S3, Minio or Azure Blob Storage.
OVMS is now implemented in C++ and provides much higher scalability compared to its predecessor in Python version. You can take advantage of all the power of Xeon CPU capabilities or AI accelerators and expose it over the network interface. Read release notes to find out what's new in C++ version.
Review the Architecture concept document for more details.
A few key features:
- Support for multiple frameworks. Serve models trained in popular formats such as Caffe*, TensorFlow*, MXNet* and ONNX*.
- Deploy new model versions without changing client code.
- Support for AI accelerators including Intel Movidius Myriad VPUs, GPU and HDDL.
- The server can be enabled both on Bare Metal Hosts or in Docker containers.
- Kubernetes deployments. The server can be deployed in a Kubernetes cluster allowing the inference service to scale horizontally and ensure high availability.
- Model reshaping. The server supports reshaping models in runtime.
- Model ensemble (preview). Connect multiple models to deploy complex processing solutions and reduce overhead of sending data back and forth.
Note: OVMS has been tested on CentOS and Ubuntu. Publically released docker images are based on CentOS.**
Build the docker image using command:
make docker_build DLDT_PACKAGE_URL=<URL>
called from the root directory of the repository. Note: URL to OpenVINO Toolkit package can be received after registration on OpenVINO™ Toolkit website.
It will generate the images, tagged as:
openvino/model_server:latest
- with CPU, NCS and HDDL supportopenvino/model_server:latest-gpu
- with CPU, NCS, HDDL and iGPU support
as well as a release package (.tar.gz, with ovms binary and necessary libraries), in a ./dist directory.
The release package is compatible with linux machines on which glibc
version is greater than or equal to the build image version 2.17.
For debugging, an image with a suffix -build
is also generated (i.e. openvino/model_server-build:latest
).
Note: Images include OpenVINO 2021.1 release.
A demonstration how to use OpenVINO Model Server can be found in a quick start guide.
More detailed guides to using Model Server in various scenarios can be found here:
OpenVINO™ Model Server gRPC API is documented in the proto buffer files in tensorflow_serving_api.
Note: The implementations for Predict
, GetModelMetadata
and GetModelStatus
function calls are currently available.
These are the most generic function calls and should address most of the usage scenarios.
Predict proto defines two message specifications: PredictRequest
and PredictResponse
used while calling Prediction endpoint.
PredictRequest
specifies information about the model spec (name and version) and a map of input data serialized via TensorProto to a string format.PredictResponse
includes a map of outputs serialized by TensorProto and information about the used model spec.
Get Model Metadata proto defines three message definitions used while calling Metadata endpoint:
SignatureDefMap
, GetModelMetadataRequest
, GetModelMetadataResponse
.
A function call GetModelMetadata
accepts model spec information as input and returns Signature Definition content in the format similar to TensorFlow Serving.
Get Model Status proto defines three message definitions used while calling Status endpoint:
GetModelStatusRequest
, ModelVersionStatus
, GetModelStatusResponse
that are used to report all exposed versions including their state in their lifecycle.
Refer to the example client code to learn how to use this API and submit the requests using the gRPC interface.
Using the gRPC interface is recommended for optimal performance due to its faster implementation of input data deserialization. It allows you to achieve lower latency, especially with larger input messages like images.
OpenVINO™ Model Server RESTful API follows the documentation from tensorflow serving rest api.
Both row and column format of the requests are implemented.
Note: Just like with gRPC, only the implementations for Predict
, GetModelMetadata
and GetModelStatus
function calls are currently available.
Only the numerical data types are supported.
Review the exemplary clients below to find out more how to connect and run inference requests.
REST API is recommended when the primary goal is in reducing the number of client side python dependencies and simpler application code.
- Using
Predict
function over gRPC and RESTful API with numpy data input - Using
GetModelMetadata
function over gRPC and RESTful API - Using
GetModelStatus
function over gRPC and RESTful API - Example script submitting jpeg images for image classification
Learn more about tests in the developer guide
- Currently,
Predict
,GetModelMetadata
andGetModelStatus
calls are implemented using Tensorflow Serving API. Classify
,Regress
andMultiInference
are not included.- Output_filter is not effective in the Predict call. All outputs defined in the model are returned to the clients.
-
All contributed code must be compatible with the Apache 2 license.
-
All changes needs to have pass linter, unit and functional tests.
-
All new features need to be covered by tests.
Follow a contributor guide and a developer guide.
Submit Github issue to ask question, request a feature or report a bug.
* Other names and brands may be claimed as the property of others.