/kubeflow-poc

Scratch files for experiments with Kubeflow

Primary LanguageJupyter Notebook

Kubeflow POC

Installation

  • run install.sh
  • once completed, run kubectl port-forward -n istio-system svc/istio-ingressgateway 8080:80
  • navigate to Kubeflow UI at localhost:8080
  • create namespace 'spark' in the create namespace dialog. Name 'spark' is important here because RBAC and resources from this repo use it (TODO: should be changed to context-based installation)

Expected result:

Jupyter integration with Spark

  • disable Istio sidecar injection by removing a label from the namespace k label ns spark istio-injection- (kubeflow/issues/4306)
  • verify namespace, configure roles, and install Jupyter from specs:
    kubectl apply -f specs/namespace.yaml
    kubectl apply -f specs/rbac.yaml
    kubectl apply -f specs/notebook.yaml
    
  • verify notebook is up at localhost:8080/_/jupyter/
  • run examples and check executor pods are up and actually running tasks

Examples

Apache Toree kernel (Scala):

import org.apache.spark.{SparkConf, SparkContext}
import java.net._

val localIpAddress: String =  InetAddress.getLocalHost.getHostAddress

val conf = new SparkConf()
           .setAppName("Toree test")
           .setMaster("k8s://https://kubernetes.default.svc.cluster.local:443")
           .set("spark.driver.host", localIpAddress)
           .set("spark.kubernetes.namespace", "spark")
           .set("spark.kubernetes.container.image", "mesosphere/spark:2.4.3-bin-hadoop2.9-k8s")
val sc = new SparkContext(conf)
sc.parallelize(1 to 1000).sum

Expected output: 500500.0

PySpark:

import pyspark
from pyspark.sql import SparkSession
import socket
from operator import add

localIpAddress = socket.gethostbyname(socket.gethostname())

conf = pyspark.SparkConf().setAll([
    ('spark.master', 'k8s://https://kubernetes.default.svc.cluster.local:443'),
    ("spark.driver.host", localIpAddress),
    ("spark.kubernetes.namespace", "spark"),
    ("spark.kubernetes.container.image", "akirillov/spark:spark-2.4.3-hadoop-2.9-k8s")])

spark = SparkSession.builder.appName("PySpark").config(conf=conf).getOrCreate()
spark.sparkContext.parallelize(range(1, 1001)).reduce(add)

Expected output: 500500

Plotting with matplotlib

import matplotlib.pyplot as plt

from random import seed
from random import randint

seed(1)
data = spark.sparkContext.parallelize(range(1, 1001)).map(lambda num: randint(0, 100)).collect()
plt.hist(data, bins=100)

Expected output:

Tensorflow & Horovod

Notebooks

  • commons - a notebook with helper methods. It should be uploaded together with other notebooks
  • MNIST Tensorflow - a vanilla Tensorflow example for MNIST
  • MNIST Horovod - MNIST training using Horovod with Tensorflow (local mode, CPU)
  • Horovod Spark - a simple example of Horovod-Spark integration
  • MNIST Horovod Spark - MNIST training using Horovod with Tensorflow for training and Spark for parallelization
  • MNIST Spark Horovod ETL-ML - MNIST training using Spark to read the data and then pass to Tensorflow using Horovod for broadcasting variables

Docker images

This repo contains the following Dockerfiles used in specs, examples, and notebooks:

  • Horovod - installs Tensorflow, PyTorch, MXNet, and Horovod
  • Spark - builds on top of Horovod image and adds OpenJDK and Spark
  • Jupyter Notebook - builds on top of Spark image and adds Jupyter

Running Spark

Running Spark from a pod

Create a service account and roles, run a pod based on a valid Spark image:

kubectl apply -f specs/rbac.yaml
kubectl run -it busybox --image=mesosphere/spark:2.4.3-bin-hadoop2.9-k8s --command bash --restart=Never --serviceaccount=spark-sa --rm

spark-shell

/opt/spark/bin/spark-shell --master k8s://https://kubernetes.default.svc.cluster.local:443 \
--conf spark.driver.host=$(hostname -i) \
--conf spark.kubernetes.namespace=spark \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark-sa \
--conf spark.kubernetes.container.image=mesosphere/spark:2.4.3-bin-hadoop2.9-k8s

spark-submit (cluster mode)

/opt/spark/bin/spark-submit --master k8s://https://kubernetes.default.svc.cluster.local:443 \
--deploy-mode cluster \
--conf spark.kubernetes.namespace=spark \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark-sa \
--conf spark.kubernetes.authenticate.submission.caCertFile=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt \
--conf spark.kubernetes.authenticate.submission.oauthTokenFile=/var/run/secrets/kubernetes.io/serviceaccount/token \
--conf spark.kubernetes.container.image=mesosphere/spark:2.4.3-bin-hadoop2.9-k8s  \
--class org.apache.spark.examples.SparkPi \
local:///opt/spark/examples/jars/spark-examples_2.11-2.4.3.jar 1000

TODO