KFServing provides a Kubernetes Custom Resource Definition for serving machine learning (ML) models on arbitrary frameworks. It aims to solve production model serving use cases by providing performant, high abstraction interfaces for common ML frameworks like Tensorflow, XGBoost, ScikitLearn, PyTorch, and ONNX.
It encapsulates the complexity of autoscaling, networking, health checking, and server configuration to bring cutting edge serving features like GPU Autoscaling, Scale to Zero, and Canary Rollouts to your ML deployments. It enables a simple, pluggable, and complete story for Production ML Serving including prediction, pre-processing, post-processing and explainability. KFServing is being used across various organizations.
KFServing Features and Examples
To learn more about KFServing, how to deploy it as part of Kubeflow, how to use various supported features, and how to participate in the KFServing community, please follow the KFServing docs on the Kubeflow Website. Additionally, we have compiled a list of KFServing presentations and demoes to dive through various details.
Kubernetes 1.16+ is the minimum recommended version for KFServing.
Knative Serving and Istio should be available on Kubernetes Cluster, Knative depends on Istio Ingress Gateway to route requests to Knative services. To use the exact versions tested by the Kubeflow and KFServing teams, please refer to the prerequisites on developer guide
- Istio: v1.3.1+
If you want to get up running Knative quickly or you do not need service mesh, we recommend installing Istio without service mesh(sidecar injection).
- Knative Serving: v0.14.3+
Currently only Knative Serving
is required, cluster-local-gateway
is required to serve cluster-internal traffic for transformer and explainer use cases. Please follow instructions here to install cluster local gateway
- Cert Manager: v0.12.0+
Cert manager is needed to provision KFServing webhook certs for production grade installation, alternatively you can run our self signed certs generation script.
Note that since Knative v0.19.0 cluster local gateway
has been removed and shared with ingress gateway,
if you are on Knative version older than v0.19.0 you should modify localGateway
to knative-local-gateway
and localGatewayService
to knative-local-gateway.istio-system.svc.cluster.local
in the
inference service config.
Expand to see the installation options!
KFServing can be installed standalone if your kubernetes cluster meets the above prerequisites and KFServing controller is deployed in kfserving-system
namespace.
TAG=v0.5.0
Install KFServing CRD
Due to large last applied annotation issue with kubectl apply
we recommend using kubectl replace
for upgrading crd.
kubectl replace -f ./install/$TAG/kfserving_crd.yaml || kubectl create -f ./install/$TAG/kfserving_crd.yaml
Install KFServing Controller
kubectl apply -f ./install/$TAG/kfserving.yaml
To install standalone KFServing on OpenShift Container Platform, please follow the instructions here.
KFServing is installed by default as part of Kubeflow installation using Kubeflow manifests and KFServing controller is deployed in kubeflow
namespace.
Since Kubeflow Kubernetes minimal requirement is 1.14 which does not support object selector, ENABLE_WEBHOOK_NAMESPACE_SELECTOR
is enabled in Kubeflow installation by default.
If you are using Kubeflow dashboard or profile controller to create user namespaces, labels are automatically added to enable KFServing to deploy models. If you are creating namespaces manually using Kubernetes apis directly, you will need to add label serving.kubeflow.org/inferenceservice: enabled
to allow deploying KFServing InferenceService
in the given namespaces, and do ensure you do not deploy
InferenceService
in kubeflow
namespace which is labelled as control-plane
.
As of KFServing 0.4 release object selector is turned on by default, the KFServing pod mutator is only invoked for KFServing InferenceService
pods. For prior releases you can turn on manually by running following command.
kubectl patch mutatingwebhookconfiguration inferenceservice.serving.kubeflow.org --patch '{"webhooks":[{"name": "inferenceservice.kfserving-webhook-server.pod-mutator","objectSelector":{"matchExpressions":[{"key":"serving.kubeflow.org/inferenceservice", "operator": "Exists"}]}}]}'
Make sure you have kubectl installed.
- If you do not have an existing kubernetes cluster, you can create a quick kubernetes local cluster with kind.
Note that the minimal requirement for running KFServing is 4 cpus and 8Gi memory, so you need to change the docker resource setting to use 4 cpus and 8Gi memory.
kind create cluster
alternatively you can use Minikube
minikube start --cpus 4 --memory 8192 --kubernetes-version=v1.17.11
- Install Istio lean version, Knative Serving, KFServing all in one.(this takes 30s)
./hack/quick_install.sh
If the default ingress gateway setup does not fit your need, you can choose to setup a custom ingress gateway
- Configure Custom Ingress Gateway
- In addition you need to update KFServing configmap to use the custom ingress gateway.
- Configure Custom Domain
- Configure HTTPS Connection
Execute the following command to determine if your kubernetes cluster is running in an environment that supports external load balancers
$ kubectl get svc istio-ingressgateway -n istio-system
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
istio-ingressgateway LoadBalancer 172.21.109.129 130.211.10.121 ... 17h
If the EXTERNAL-IP value is set, your environment has an external load balancer that you can use for the ingress gateway.
export INGRESS_HOST=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
export INGRESS_PORT=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.spec.ports[?(@.name=="http2")].port}')
If the EXTERNAL-IP value is none (or perpetually pending), your environment does not provide an external load balancer for the ingress gateway. In this case, you can access the gateway using the service’s node port.
# GKE
export INGRESS_HOST=worker-node-address
# Minikube
export INGRESS_HOST=$(minikube ip)
# Other environment(On Prem)
export INGRESS_HOST=$(kubectl get po -l istio=ingressgateway -n istio-system -o jsonpath='{.items[0].status.hostIP}')
export INGRESS_PORT=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.spec.ports[?(@.name=="http2")].nodePort}')
Alternatively you can do Port Forward
for testing purpose
INGRESS_GATEWAY_SERVICE=$(kubectl get svc --namespace istio-system --selector="app=istio-ingressgateway" --output jsonpath='{.items[0].metadata.name}')
kubectl port-forward --namespace istio-system svc/${INGRESS_GATEWAY_SERVICE} 8080:80
# start another terminal
export INGRESS_HOST=localhost
export INGRESS_PORT=8080
Expand to see steps for testing the installation!
kubectl get po -n kfserving-system
NAME READY STATUS RESTARTS AGE
kfserving-controller-manager-0 2/2 Running 2 13m
Please refer to our troubleshooting section for recommendations and tips for issues with installation.
API_VERSION=v1alpha2
kubectl create namespace kfserving-test
kubectl apply -f docs/samples/${API_VERSION}/sklearn/sklearn.yaml -n kfserving-test
kubectl get inferenceservices sklearn-iris -n kfserving-test
NAME URL READY DEFAULT TRAFFIC CANARY TRAFFIC AGE
sklearn-iris http://sklearn-iris.kfserving-test.example.com/v1/models/sklearn-iris True 100 109s
Curl from ingress gateway
SERVICE_HOSTNAME=$(kubectl get inferenceservice sklearn-iris -n kfserving-test -o jsonpath='{.status.url}' | cut -d "/" -f 3)
curl -v -H "Host: ${SERVICE_HOSTNAME}" http://${INGRESS_HOST}:${INGRESS_PORT}/v1/models/sklearn-iris:predict -d @./docs/samples/${API_VERSION}/sklearn/iris-input.json
Curl from local cluster gateway
curl -v http://sklearn-iris.kfserving-test/v1/models/sklearn-iris:predict -d @./docs/samples/${API_VERSION}/sklearn/iris-input.json
# use kubectl create instead of apply because the job template is using generateName which doesn't work with kubectl apply
kubectl create -f docs/samples/${API_VERSION}/sklearn/perf.yaml -n kfserving-test
# wait the job to be done and check the log
kubectl logs load-test8b58n-rgfxr -n kfserving-test
Requests [total, rate, throughput] 30000, 500.02, 499.99
Duration [total, attack, wait] 1m0s, 59.998s, 3.336ms
Latencies [min, mean, 50, 90, 95, 99, max] 1.743ms, 2.748ms, 2.494ms, 3.363ms, 4.091ms, 7.749ms, 46.354ms
Bytes In [total, mean] 690000, 23.00
Bytes Out [total, mean] 2460000, 82.00
Success [ratio] 100.00%
Status Codes [code:count] 200:30000
Error Set:
-
Install the SDK
pip install kfserving
-
Check the KFServing SDK documents from here.
-
Follow the example(s) here to use the KFServing SDK to create, rollout, promote, and delete an InferenceService instance.
KFServing Presentations and Demoes
Debug KFServing InferenceService
KFServing benchmark test comparing Knative and Kubernetes Deployment with HPA