The Open Inference Protocol(OIP) specification defines a standard protocol for performing machine learning model inference across serving runtimes for different ML frameworks. The protocol facilitates the implementation of a standardized and high performance data plane, promoting interoperability among model serving runtimes. The specification enables the creation of cohesive inference experience, enpowering the development of versatile client or benchmarking tools that can work with all supported serving runtimes.
- The inference REST specification
- The inference gRPC specification
- KServe v2 inference protocol
- NVIDIA Triton inference server protocol
- Seldon MLServer
- Seldon Core v2 inference protocol
- OpenVino RESTful API and gRPC API
- AMD Inference Server
- TorchServe Inference API
Changes to the specification are versioned according to Semantic Versioning 2.0 and described in CHANGELOG.md. Layout changes are not versioned. Specific implementations of the specification should specify which version they implement.
By contributing to Open Inference Protocol Specification repository, you agree that your contributions will be licensed under its Apache 2.0 License.