⚠️ kubeflow/example-seldon is not maintained

This repository has been deprecated and archived on Nov 30th, 2021.

Train and Deploy Machine Learning Models on Kubernetes with Kubeflow and Seldon-Core

Using:

The example will be the MNIST handwritten digit classification task. We will train 3 different models to solve this task:

A TensorFlow neural network model.
A scikit-learn random forest model.
An R least squares model.

We will then show various rolling deployments

Deploy the single Tensorflow model.
Do a rolling update to an AB test of the Tensorflow model and the sklearn model.
Do a rolling update to a Multi-armed Bandit over all 3 models to direct traffic in real time to the best model.

In the follow we will:

Install kubeflow and seldon-core on a kubernetes cluster
Train the models
Serve the models

Requirements

gcloud
kubectl
ksonnet
argo

Setup

There is a consolidated script to create the demo which can be found here. For a step by step guide do the following:

Install kubeflow on GKE. This should create kubeflow in a namespace kubeflow. We suggest you use the command line install so you can easily modify your Ksonnet installation. Ensure you have the environment variables KUBEFLOW_SRC and KFAPP set. OAUTH is preferred as with basic auth port-forwarding to ambassador is insufficient

Install seldon. Go to your Ksonnet application folder setup in the previous step and run

cd ${KUBEFLOW_SRC}/${KFAPP}/ks_app

ks pkg install kubeflow/seldon
ks generate seldon seldon
ks apply default -c seldon

Install Helm

kubectl -n kube-system create sa tiller
kubectl create clusterrolebinding tiller --clusterrole cluster-admin --serviceaccount=kube-system:tiller
helm init --service-account tiller
kubectl rollout status deploy/tiller-deploy -n kube-system

Create an NFS disk and persistent volume claim called nfs-1. You can follow one guide on create an NFS volume using Google Filestore here. A consolidated set of steps is shown here

Add Cluster Roles so Argo can start jobs successfully

kubectl create clusterrolebinding my-cluster-admin-binding --clusterrole=cluster-admin --user=$(gcloud info --format="value(config.account)")
kubectl create clusterrolebinding default-admin2 --clusterrole=cluster-admin --serviceaccount=kubeflow:default

Install Seldon Analytics Dashboard

helm install seldon-core-analytics --name seldon-core-analytics --set grafana_prom_admin_password=password --set persistence.enabled=false --repo https://storage.googleapis.com/seldon-charts --namespace kubeflow

Port forward the dashboard when running

kubectl port-forward $(kubectl get pods -n kubeflow -l app=grafana-prom-server -o jsonpath='{.items[0].metadata.name}') -n kubeflow 3000:3000

Visit http://localhost:3000/dashboard/db/prediction-analytics?refresh=5s&orgId=1 and login using "admin" and the password you set above when launching with helm.

MNIST models

Tensorflow Model

SKLearn Model

R Model

Train the Models

Follow the steps in ./notebooks/training.ipynb to:

Run Argo Jobs for each model to:
- Creating training images and push to repo
- Run training
- Create runtime prediction images and push to repo
- Deploy individual runtime model

To push to your own repo the Docker images you will need to setup your docker credentials as a Kubernetes secret containing a config.json. To do this you can find your docker home (typically ~/.docker) and run kubectl create secret generic docker-config --from-file=config.json=${DOCKERHOME}/config.json --type=kubernetes.io/config to create a secret.

Serve the Models

Follow the steps in ./notebooks/serving.ipynb to:

Deploy the single Tensorflow model.
Do a rolling update to an AB test of the Tensorflow model and the sklearn model.
Do a rolling update to a Multi-armed Bandit over all 3 models to direct traffic in real time to the best model.

To ensure the notebook can run successfully install the python dependencies:

pip install -r notebooks/requirements.txt

If you have installed the Seldon-Core analytics you can view them on the grafana dashboard: