/kubernetes-ml-ops

An introduction to machine learning model deployment operations (MLOps) using Python, Docker, Kubernetes and Seldon-Core.

Primary LanguagePythonMIT LicenseMIT

Deploying Machine Learning Models on Kubernetes

A common pattern for deploying Machine Learning (ML) models into production environments - e.g. ML models trained using the SciKit Learn or Keras packages (for Python), that are ready to provide predictions on new data - is to expose these ML as RESTful API microservices, hosted from within Docker containers. These can then deployed to a cloud environment for handling everything required for maintaining continuous availability - e.g. fault-tolerance, auto-scaling, load balancing and rolling service updates.

The configuration details for a continuously available cloud deployment are specific to the targeted cloud provider(s) - e.g. the deployment process and topology for Amazon Web Services is not the same as that for Microsoft Azure, which in-turn is not the same as that for Google Cloud Platform. This constitutes knowledge that needs to be acquired for every cloud provider. Furthermore, it is difficult (some would say near impossible) to test entire deployment strategies locally, which makes issues such as networking hard to debug.

Kubernetes is a container orchestration platform that seeks to address these issues. Briefly, it provides a mechanism for defining entire microservice-based application deployment topologies and their service-level requirements for maintaining continuous availability. It is agnostic to the targeted cloud provider, can be run on-premises and even locally on your laptop - all that's required is a cluster of virtual machines running Kubernetes - i.e. a Kubernetes cluster.

This README is designed to be read in conjunction with the code in this repository, that contains the Python modules, Docker configuration files and Kubernetes instructions for demonstrating how a simple Python ML model can be turned into a production-grade RESTful model-scoring (or prediction) API service, using Docker and Kubernetes - both locally and with Google Cloud Platform (GCP). It is not a comprehensive guide to Kubernetes, Docker or ML - think of it more as a 'ML on Kubernetes 101' for demonstrating capability and allowing newcomers to Kubernetes (e.g. data scientists who are more focused on building models as opposed to deploying them), to get up-and-running quickly and become familiar with the basic concepts and patterns.

We will demonstrate ML model deployment using two different approaches: a first principles approach using Docker and Kubernetes; and then a deployment using the Seldon-Core Kubernetes native framework for streamlining the deployment of ML services. The former will help to appreciate the latter, which constitutes a powerful framework for deploying and performance-monitoring many complex ML model pipelines.

This work was initially committed in 2018 and has since formed the basis of Bodywork - a MLOps framework for running model-training workloads and deploying model-scoring services on Kubernetes. This framework, open-sourced in December 2020, is an attempt to automate a lot of the steps that this project has demonstrated to many machine learning engineers over the years.

Containerising a Simple ML Model Scoring Service using Flask and Docker

We start by demonstrating how to achieve this basic competence using the simple Python ML model scoring REST API contained in the api.py module, together with the Dockerfile, both within the py-flask-ml-score-api directory, whose core contents are as follows,

py-flask-ml-score-api/
 | Dockerfile
 | Pipfile
 | Pipfile.lock
 | api.py

If you're already feeling lost then these files are discussed in the points below, otherwise feel free to skip to the next section.

Defining the Flask Service in the api.py Module

This is a Python module that uses the Flask framework for defining a web service (app), with a function (score), that executes in response to a HTTP request to a specific URL (or 'route'), thanks to being wrapped by the app.route function. For reference, the relevant code is reproduced below,

from flask import Flask, jsonify, make_response, request

app = Flask(__name__)


@app.route('/score', methods=['POST'])
def score():
    features = request.json['X']
    return make_response(jsonify({'score': features}))


if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5000)

If running locally - e.g. by starting the web service using python run api.py - we would be able reach our function (or 'endpoint') at http://localhost:5000/score. This function takes data sent to it as JSON (that has been automatically de-serialised as a Python dict made available as the request variable in our function definition), and returns a response (automatically serialised as JSON).

In our example function, we expect an array of features, X, that we pass to a ML model, which in our example returns those same features back to the caller - i.e. our chosen ML model is the identity function, which we have chosen for purely demonstrative purposes. We could just as easily have loaded a pickled SciKit-Learn or Keras model and passed the data to the approproate predict method, returning a score for the feature-data as JSON - see here for an example of this in action.

Defining the Docker Image with the Dockerfile

A Dockerfile is essentially the configuration file used by Docker, that allows you to define the contents and configure the operation of a Docker container, when operational. This static data, when not executed as a container, is referred to as the 'image'. For reference, the Dockerfile is reproduced below,

FROM python:3.6-slim
WORKDIR /usr/src/app
COPY . .
RUN pip install pipenv
RUN pipenv install
EXPOSE 5000
CMD ["pipenv", "run", "python", "api.py"]

In our example Dockerfile we:

  • start by using a pre-configured Docker image (python:3.6-slim) that has a version of the Alpine Linux distribution with Python already installed;
  • then copy the contents of the py-flask-ml-score-api local directory to a directory on the image called /usr/src/app;
  • then use pip to install the Pipenv package for Python dependency management (see the appendix at the bottom for more information on how we use Pipenv);
  • then use Pipenv to install the dependencies described in Pipfile.lock into a virtual environment on the image;
  • configure port 5000 to be exposed to the 'outside world' on the running container; and finally,
  • to start our Flask RESTful web service - api.py. Note, that here we are relying on Flask's internal WSGI server, whereas in a production setting we would recommend on configuring a more robust option (e.g. Gunicorn), as discussed here.

Building this custom image and asking the Docker daemon to run it (remember that a running image is a 'container'), will expose our RESTful ML model scoring service on port 5000 as if it were running on a dedicated virtual machine. Refer to the official Docker documentation for a more comprehensive discussion of these core concepts.

Building a Docker Image for the ML Scoring Service

We assume that Docker is running locally (both Docker client and daemon), that the client is logged into an account on DockerHub and that there is a terminal open in the this project's root directory. To build the image described in the Dockerfile run,

docker build --tag alexioannides/test-ml-score-api py-flask-ml-score-api

Where 'alexioannides' refers to the name of the DockerHub account that we will push the image to, once we have tested it.

Testing

To test that the image can be used to create a Docker container that functions as we expect it to use,

docker run --rm --name test-api -p 5000:5000 -d alexioannides/test-ml-score-api

Where we have mapped port 5000 from the Docker container - i.e. the port our ML model scoring service is listening to - to port 5000 on our host machine (localhost). Then check that the container is listed as running using,

docker ps

And then test the exposed API endpoint using,

curl http://localhost:5000/score \
    --request POST \
    --header "Content-Type: application/json" \
    --data '{"X": [1, 2]}'

Where you should expect a response along the lines of,

{"score":[1,2]}

All our test model does is return the input data - i.e. it is the identity function. Only a few lines of additional code are required to modify this service to load a SciKit Learn model from disk and pass new data to it's 'predict' method for generating predictions - see here for an example. Now that the container has been confirmed as operational, we can stop it,

docker stop test-api

Pushing the Image to the DockerHub Registry

In order for a remote Docker host or Kubernetes cluster to have access to the image we've created, we need to publish it to an image registry. All cloud computing providers that offer managed Docker-based services will provide private image registries, but we will use the public image registry at DockerHub, for convenience. To push our new image to DockerHub (where my account ID is 'alexioannides') use,

docker push alexioannides/test-ml-score-api

Where we can now see that our chosen naming convention for the image is intrinsically linked to our target image registry (you will need to insert your own account ID where required). Once the upload is finished, log onto DockerHub to confirm that the upload has been successful via the DockerHub UI.

Installing Kubernetes for Local Development and Testing

There are two options for installing a single-node Kubernetes cluster that is suitable for local development and testing: via the Docker Desktop client, or via Minikube.

Installing Kubernetes via Docker Desktop

If you have been using Docker on a Mac, then the chances are that you will have been doing this via the Docker Desktop application. If not (e.g. if you installed Docker Engine via Homebrew), then Docker Desktop can be downloaded here. Docker Desktop now comes bundled with Kubernetes, which can be activated by going to Preferences -> Kubernetes and selecting Enable Kubernetes. It will take a while for Docker Desktop to download the Docker images required to run Kubernetes, so be patient. After it has finished, go to Preferences -> Advanced and ensure that at least 2 CPUs and 4 GiB have been allocated to the Docker Engine, which are the the minimum resources required to deploy a single Seldon ML component.

To interact with the Kubernetes cluster you will need the kubectl Command Line Interface (CLI) tool, which will need to be downloaded separately. The easiest way to do this on a Mac is via Homebrew - i.e with brew install kubernetes-cli. Once you have kubectl installed and a Kubernetes cluster up-and-running, test that everything is working as expected by running,

kubectl cluster-info

Which ought to return something along the lines of,

Kubernetes master is running at https://kubernetes.docker.internal:6443
KubeDNS is running at https://kubernetes.docker.internal:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.

Installing Kubernetes via Minikube

On Mac OS X, the steps required to get up-and-running with Minikube are as follows:

  • make sure the Homebrew package manager for OS X is installed; then,
  • install VirtualBox using, brew cask install virtualbox (you may need to approve installation via OS X System Preferences); and then,
  • install Minikube using, brew cask install minikube.

To start the test cluster run,

minikube start --memory 4096

Where we have specified the minimum amount of memory required to deploy a single Seldon ML component. Be patient - Minikube may take a while to start. To test that the cluster is operational run,

kubectl cluster-info

Where kubectl is the standard Command Line Interface (CLI) client for interacting with the Kubernetes API (which was installed as part of Minikube, but is also available separately).

Deploying the Containerised ML Model Scoring Service to Kubernetes

To launch our test model scoring service on Kubernetes, we will start by deploying the containerised service within a Kubernetes Pod, whose rollout is managed by a Deployment, which in in-turn creates a ReplicaSet - a Kubernetes resource that ensures a minimum number of pods (or replicas), running our service are operational at any given time. This is achieved with,

kubectl create deployment test-ml-score-api --image=alexioannides/test-ml-score-api:latest

To check on the status of the deployment run,

kubectl rollout status deployment test-ml-score-api

And to see the pods that is has created run,

kubectl get pods

It is possible to use port forwarding to test an individual container without exposing it to the public internet. To use this, open a separate terminal and run (for example),

kubectl port-forward test-ml-score-api-szd4j 5000:5000

Where test-ml-score-api-szd4j is the precise name of the pod currently active on the cluster, as determined from the kubectl get pods command. Then from your original terminal, to repeat our test request against the same container running on Kubernetes run,

curl http://localhost:5000/score \
    --request POST \
    --header "Content-Type: application/json" \
    --data '{"X": [1, 2]}'

To expose the container as a (load balanced) service to the outside world, we have to create a Kubernetes service that references it. This is achieved with the following command,

kubectl expose deployment test-ml-score-api --port 5000 --type=LoadBalancer --name test-ml-score-api-lb

If you are using Docker Desktop, then this will automatically emulate a load balancer at http://localhost:5000. To find where Minikube has exposed its emulated load balancer run,

minikube service list

Now we test our new service - for example (with Docker Desktop),

curl http://localhost:5000/score \
    --request POST \
    --header "Content-Type: application/json" \
    --data '{"X": [1, 2]}'

Note, neither Docker Desktop or Minikube setup a real-life load balancer (which is what would happen if we made this request on a cloud platform). To tear-down the load balancer, deployment and pod, run the following commands in sequence,

kubectl delete deployment test-ml-score-api
kubectl delete service test-ml-score-api-lb

Configuring a Multi-Node Cluster on Google Cloud Platform

In order to perform testing on a real-world Kubernetes cluster with far greater resources than those available on a laptop, the easiest way is to use a managed Kubernetes platform from a cloud provider. We will use Kubernetes Engine on Google Cloud Platform (GCP).

Getting Up-and-Running with Google Cloud Platform

Before we can use Google Cloud Platform, sign-up for an account and create a project specifically for this work. Next, make sure that the GCP SDK is installed on your local machine - e.g.,

brew cask install google-cloud-sdk

Or by downloading an installation image directly from GCP. Note, that if you haven't already installed Kubectl, then you will need to do so now, which can be done using the GCP SDK,

gcloud components install kubectl

We then need to initialise the SDK,

gcloud init

Which will open a browser and guide you through the necessary authentication steps. Make sure you pick the project you created, together with a default zone and region (if this has not been set via Compute Engine -> Settings).

Initialising a Kubernetes Cluster

Firstly, within the GCP UI visit the Kubernetes Engine page to trigger the Kubernetes API to start-up. From the command line we then start a cluster using,

gcloud container clusters create k8s-test-cluster --num-nodes 3 --machine-type g1-small

And then go make a cup of coffee while you wait for the cluster to be created. Note, that this will automatically switch your kubectl context to point to the cluster on GCP, as you will see if you run, kubectl config get-contexts. To switch back to the Docker Desktop client use kubectl config use-context docker-desktop.

Launching the Containerised ML Model Scoring Service on GCP

This is largely the same as we did for running the test service locally - run the following commands in sequence,

kubectl create deployment test-ml-score-api --image=alexioannides/test-ml-score-api:latest
kubectl expose deployment test-ml-score-api --port 5000 --type=LoadBalancer --name test-ml-score-api-lb

But, to find the external IP address for the GCP cluster we will need to use,

kubectl get services

And then we can test our service on GCP - for example,

curl http://35.246.92.213:5000/score \
    --request POST \
    --header "Content-Type: application/json" \
    --data '{"X": [1, 2]}'

Or, we could again use port forwarding to attach to a single pod - for example,

kubectl port-forward test-ml-score-api-nl4sc 5000:5000

And then in a separate terminal,

curl http://localhost:5000/score \
    --request POST \
    --header "Content-Type: application/json" \
    --data '{"X": [1, 2]}'

Finally, we tear-down the replication controller and load balancer,

kubectl delete deployment test-ml-score-api
kubectl delete service test-ml-score-api-lb

Switching Between Kubectl Contexts

If you are running both with Kubernetes locally and with a cluster on GCP, then you can switch Kubectl context from one cluster to the other, as follows,

kubectl config use-context docker-desktop

Where the list of available contexts can be found using,

kubectl config get-contexts

Using YAML Files to Define and Deploy the ML Model Scoring Service

Up to this point we have been using Kubectl commands to define and deploy a basic version of our ML model scoring service. This is fine for demonstrative purposes, but quickly becomes limiting, as well as unmanageable. In practice, the standard way of defining entire Kubernetes deployments is with YAML files, posted to the Kubernetes API. The py-flask-ml-score.yaml file in the py-flask-ml-score-api directory is an example of how our ML model scoring service can be defined in a single YAML file. This can now be deployed using a single command,

kubectl apply -f py-flask-ml-score-api/py-flask-ml-score.yaml

Note, that we have defined three separate Kubernetes components in this single file: a namespace, a deployment and a load-balanced service - for all of these components (and their sub-components), using --- to delimit the definition of each separate component. To see all components deployed into this namespace use,

kubectl get all --namespace test-ml-app

And likewise set the --namespace flag when using any kubectl get command to inspect the different components of our test app. Alternatively, we can set our new namespace as the default context,

kubectl config set-context $(kubectl config current-context) --namespace=test-ml-app

And then run,

kubectl get all

Where we can switch back to the default namespace using,

kubectl config set-context $(kubectl config current-context) --namespace=default

To tear-down this application we can then use,

kubectl delete -f py-flask-ml-score-api/py-flask-ml-score.yaml

Which saves us from having to use multiple commands to delete each component individually. Refer to the official documentation for the Kubernetes API to understand the contents of this YAML file in greater depth.

Using Helm Charts to Define and Deploy the ML Model Scoring Service

Writing YAML files for Kubernetes can get repetitive and hard to manage, especially if there is a lot of 'copy-paste' involved, when only a handful of parameters need to be changed from one deployment to the next, but there is a 'wall of YAML' that needs to be modified. Enter Helm - a framework for creating, executing and managing Kubernetes deployment templates. What follows is a very high-level demonstration of how Helm can be used to deploy our ML model scoring service - for a comprehensive discussion of Helm's full capabilities (and here are a lot of them), please refer to the official documentation. Seldon-Core can also be deployed using Helm and we will cover this in more detail later on.

Installing Helm

As before, the easiest way to install Helm onto Mac OS X is to use the Homebrew package manager,

brew install kubernetes-helm

Helm relies on a dedicated deployment server, referred to as the 'Tiller', to be running within the same Kubernetes cluster we wish to deploy our applications to. Before we deploy Tiller we need to create a cluster-wide super-user role to assign to it, so that it can create and modify Kubernetes resources in any namespace. To achieve this, we start by creating a Service Account that is destined for our tiller. A Service Account is a means by which a pod (and any service running within it), when associated with a Service Accoutn, can authenticate itself to the Kubernetes API, to be able to view, create and modify resources. We create this in the kube-system namespace (a common convention) as follows,

kubectl --namespace kube-system create serviceaccount tiller

We then create a binding between this Service Account and the cluster-admin Cluster Role, which as the name suggest grants cluster-wide admin rights,

kubectl create clusterrolebinding tiller \
    --clusterrole cluster-admin \
    --serviceaccount=kube-system:tiller

We can now deploy the Helm Tiller to a Kubernetes cluster, with the desired access rights using,

helm init --service-account tiller

Deploying with Helm

To create a fresh Helm deployment definition - referred to as a 'chart' in Helm terminology - run,

helm create NAME-OF-YOUR-HELM-CHART

This creates a new directory - e.g. helm-ml-score-app as included with this repository - with the following high-level directory structure,

helm-ml-score-app/
 | -- charts/
 | -- templates/
 | Chart.yaml
 | values.yaml

Briefly, the charts directory contains other charts that our new chart will depend on (we will not make use of this), the templates directory contains our Helm templates, Chart.yaml contains core information for our chart (e.g. name and version information) and values.yaml contains default values to render our templates with (in the case that no values are set from the command line).

The next step is to delete all of the files in the templates directory (apart from NOTES.txt), and to replace them with our own. We start with namespace.yaml for declaring a namespace for our app,

apiVersion: v1
kind: Namespace
metadata:
  name: {{ .Values.app.namespace }}

Anyone familiar with HTML template frameworks (e.g. Jinja), will be familiar with the use of {{}} for defining values that will be injected into the rendered template. In this specific instance .Values.app.namespace injects the app.namespace variable, whose default value defined in values.yaml. Next we define a deployment of pods in deployment.yaml,

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: {{ .Values.app.name }}
    env: {{ .Values.app.env }}
  name: {{ .Values.app.name }}
  namespace: {{ .Values.app.namespace }}
spec:
  replicas: 1
  selector:
    matchLabels:
      app: {{ .Values.app.name }}
  template:
    metadata:
      labels:
        app: {{ .Values.app.name }}
        env: {{ .Values.app.env }}
    spec:
      containers:
      - image: {{ .Values.app.image }}
        name: {{ .Values.app.name }}
        ports:
        - containerPort: {{ .Values.containerPort }}
          protocol: TCP

And the details of the load balancer service in service.yaml,

apiVersion: v1
kind: Service
metadata:
  name: {{ .Values.app.name }}-lb
  labels:
    app: {{ .Values.app.name }}
  namespace: {{ .Values.app.namespace }}
spec:
  type: LoadBalancer
  ports:
  - port: {{ .Values.containerPort }}
    targetPort: {{ .Values.targetPort }}
  selector:
    app: {{ .Values.app.name }}

What we have done, in essence, is to split-out each component of the deployment details from py-flask-ml-score.yaml into its own file and then define template variables for each parameter of the configuration that is most likely to change from one deployment to the next. To test and examine the rendered template, without having to attempt a deployment, run,

helm install helm-ml-score-app --debug --dry-run

If you are happy with the results of the 'dry run', then execute the deployment and generate a release from the chart using,

helm install helm-ml-score-app --name test-ml-app

This will automatically print the status of the release, together with the name that Helm has ascribed to it (e.g. 'willing-yak') and the contents of NOTES.txt rendered to the terminal. To list all available Helm releases and their names use,

helm list

And to the status of all their constituent components (e.g. pods, replication controllers, service, etc.) use for example,

helm status test-ml-app

The ML scoring service can now be tested in exactly the same way as we have done previously (above). Once you have convinced yourself that it's working as expected, the release can be deleted using,

helm delete test-ml-app

Using Seldon to Deploy the ML Model Scoring Service to Kubernetes

Seldon's core mission is to simplify the repeated deployment and management of complex ML prediction pipelines on top of Kubernetes. In this demonstration we are going to focus on the simplest possible example - i.e. the simple ML model scoring API we have already been using.

Building an ML Component for Seldon

To deploy a ML component using Seldon, we need to create Seldon-compatible Docker images. We start by following these guidelines for defining a Python class that wraps an ML model targeted for deployment with Seldon. This is contained within the seldon-ml-score-component directory, whose contents are similar to those in py-flask-ml-score-api,

seldon-ml-score-component/
 | Dockerfile
 | MLScore.py
 | Pipfile
 | Pipfile.lock

Building the Docker Image for use with Seldon

Seldon requires that the Docker image for the ML scoring service be structured in a particular way:

  • the ML model has to be wrapped in a Python class with a predict method with a particular signature (or interface) - for example, in MLScore.py (deliberately named after the Python class contained within it) we have,
class MLScore:
    """
    Model template. You can load your model parameters in __init__ from
    a location accessible at runtime
    """

    def __init__(self):
        """
        Load models and add any initialization parameters (these will
        be passed at runtime from the graph definition parameters
        defined in your seldondeployment kubernetes resource manifest).
        """
        print("Initializing")

    def predict(self, X, features_names):
        """
        Return a prediction.

        Parameters
        ----------
        X : array-like
        feature_names : array of feature names (optional)
        """
        print("Predict called - will run identity function")
        return X
  • the seldon-core Python package must be installed (we use pipenv to manage dependencies as discussed above and in the Appendix below); and,
  • the container starts by running the Seldon service using the seldon-core-microservice entry-point provided by the seldon-core package - both this and the point above can be seen the DockerFile,
FROM python:3.6-slim
COPY . /app
WORKDIR /app
RUN pip install pipenv
RUN pipenv install
EXPOSE 5000

# Define environment variable
ENV MODEL_NAME MLScore
ENV API_TYPE REST
ENV SERVICE_TYPE MODEL
ENV PERSISTENCE 0

CMD pipenv run seldon-core-microservice $MODEL_NAME $API_TYPE --service-type $SERVICE_TYPE --persistence $PERSISTENCE

For the precise details refer to the official Seldon documentation. Next, build this image,

docker build seldon-ml-score-component -t alexioannides/test-ml-score-seldon-api:latest

Before we push this image to our registry, we need to make sure that it's working as expected. Start the image on the local Docker daemon,

docker run --rm -p 5000:5000 -d alexioannides/test-ml-score-seldon-api:latest

And then send it a request (using a different request format to the ones we've used thus far),

curl -g http://localhost:5000/predict \
    --data-urlencode 'json={"data":{"names":["a","b"],"tensor":{"shape":[2,2],"values":[0,0,1,1]}}}'

If response is as expected (i.e. it contains the same payload as the request), then push the image,

docker push alexioannides/test-ml-score-seldon-api:latest

Deploying a ML Component with Seldon Core

We now move on to deploying our Seldon compatible ML component to a Kubernetes cluster and creating a fault-tolerant and scalable service from it. To achieve this, we will deploy Seldon-Core using Helm charts. We start by creating a namespace that will contain the seldon-core-operator, a custom Kubernetes resource required to deploy any ML model using Seldon,

kubectl create namespace seldon-core

Then we deploy Seldon-Core using Helm and the official Seldon Helm chart repository hosted at https://storage.googleapis.com/seldon-charts,

helm install seldon-core-operator \
  --name seldon-core \
  --repo https://storage.googleapis.com/seldon-charts \
  --set usageMetrics.enabled=false \
  --namespace seldon-core

Next, we deploy the Ambassador API gateway for Kubernetes, that will act as a single point of entry into our Kubernetes cluster and will be able to route requests to any ML model we have deployed using Seldon. We will create a dedicate namespace for the Ambassador deployment,

kubectl create namespace ambassador

And then deploy Ambassador using the most recent charts in the official Helm repository,

helm install stable/ambassador \
  --name ambassador \
  --set crds.keep=false \
  --namespace ambassador

If we now run helm list --namespace seldon-core we should see that Seldon-Core has been deployed and is waiting for Seldon ML components to be deployed. To deploy our Seldon ML model scoring service we create a separate namespace for it,

kubectl create namespace test-ml-seldon-app

And then configure and deploy another official Seldon Helm chart as follows,

helm install seldon-single-model \
  --name test-ml-seldon-app \
  --repo https://storage.googleapis.com/seldon-charts \
  --set model.image.name=alexioannides/test-ml-score-seldon-api:latest \
  --namespace test-ml-seldon-app

Note, that multiple ML models can now be deployed using Seldon by repeating the last two steps and they will all be automatically reachable via the same Ambassador API gateway, which we will now use to test our Seldon ML model scoring service.

Testing the API via the Ambassador Gateway API

To test the Seldon-based ML model scoring service, we follow the same general approach as we did for our first-principles Kubernetes deployments above, but we will route our requests via the Ambassador API gateway. To find the IP address for Ambassador service run,

kubectl -n ambassador get service ambassador

Which will be localhost:80 if using Docker Desktop, or an IP address if running on GCP or Minikube (were you will need to remember to use minikuke service list in the latter case). Now test the prediction end-point - for example,

curl http://35.246.28.247:80/seldon/test-ml-seldon-app/test-ml-seldon-app/api/v0.1/predictions \
    --request POST \
    --header "Content-Type: application/json" \
    --data '{"data":{"names":["a","b"],"tensor":{"shape":[2,2],"values":[0,0,1,1]}}}'

If you want to understand the full logic behind the routing see the Seldon documentation, but the URL is essentially assembled using,

http://<ambassadorEndpoint>/seldon/<namespace>/<deploymentName>/api/v0.1/predictions

If your request has been successful, then you should see a response along the lines of,

{
  "meta": {
    "puid": "hsu0j9c39a4avmeonhj2ugllh9",
    "tags": {
    },
    "routing": {
    },
    "requestPath": {
      "classifier": "alexioannides/test-ml-score-seldon-api:latest"
    },
    "metrics": []
  },
  "data": {
    "names": ["t:0", "t:1"],
    "tensor": {
      "shape": [2, 2],
      "values": [0.0, 0.0, 1.0, 1.0]
    }
  }
}

Tear Down

To delete a single Seldon ML model and its namespace, deployed using the steps above, run,

helm delete test-ml-seldon-app --purge &&
  kubectl delete namespace test-ml-seldon-app

Follow the same pattern to remove the Seldon Core Operator and Ambassador,

helm delete seldon-core --purge && kubectl delete namespace seldon-core
helm delete ambassador --purge && kubectl delete namespace ambassador

If there is a GCP cluster that needs to be killed run,

gcloud container clusters delete k8s-test-cluster

And likewise if working with Minikube,

minikube stop
minikube delete

If running on Docker Desktop, navigate to Preferences -> Reset to reset the cluster.

Where to go from Here

The following list of resources will help you dive deeply into the subjects we skimmed-over above:

Alternatively, checkout Bodywork - a MLOps framework for running model-training workloads and deploying model-scoring services on Kubernetes. This framework, of which I am one of the core contributors, is an attempt to automate a lot of the steps that this project has demonstrated to many machine learning engineers over the years.

Appendix - Using Pipenv for Managing Python Package Dependencies

We use pipenv for managing project dependencies and Python environments (i.e. virtual environments). All of the direct packages dependencies required to run the code (e.g. Flask or Seldon-Core), as well as any packages that could have been used during development (e.g. flake8 for code linting and IPython for interactive console sessions), are described in the Pipfile. Their precise downstream dependencies are described in Pipfile.lock.

Installing Pipenv

To get started with Pipenv, first of all download it - assuming that there is a global version of Python available on your system and on the PATH, then this can be achieved by running the following command,

pip3 install pipenv

Pipenv is also available to install from many non-Python package managers. For example, on OS X it can be installed using the Homebrew package manager, with the following terminal command,

brew install pipenv

For more information, including advanced configuration options, see the official pipenv documentation.

Installing Projects Dependencies

If you want to experiment with the Python code in the py-flask-ml-score-api or seldon-ml-score-component directories, then make sure that you're in the appropriate directory and then run,

pipenv install

This will install all of the direct project dependencies.

Running Python, IPython and JupyterLab from the Project's Virtual Environment

In order to continue development in a Python environment that precisely mimics the one the project was initially developed with, use Pipenv from the command line as follows,

pipenv run python3

The python3 command could just as well be seldon-core-microservice or any other entry-point provided by the seldon-core package - for example, in the Dockerfile for the seldon-ml-score-component we start the Seldon-based ML model scoring service using,

pipenv run seldon-core-microservice ...

Pipenv Shells

Prepending pipenv to every command you want to run within the context of your Pipenv-managed virtual environment, can get very tedious. This can be avoided by entering into a Pipenv-managed shell,

pipenv shell

which is equivalent to 'activating' the virtual environment. Any command will now be executed within the virtual environment. Use exit to leave the shell session.