/kubeflow

Machine Learning Toolkit for Kubernetes

Primary LanguagePythonApache License 2.0Apache-2.0

Kubeflow

The Kubeflow project is dedicated to making Machine Learning on Kubernetes easy, portable and scalable. Our goal is not to recreate other services, but to provide a straightforward way for spinning up best of breed OSS solutions. Contained in this repository are manifests for creating:

  • A JupyterHub to create & manage interactive Jupyter notebooks
  • A Tensorflow Training Controller that can be configured to use CPUs or GPUs, and adjusted to the size of a cluster with a single setting
  • A TF Serving container

This document details the steps needed to run the Kubeflow project in any environment in which Kubernetes runs.

The Kubeflow Mission

Our goal is to help folks use ML more easily, by letting Kubernetes to do what it's great at:

  • Easy, repeatable, portable deployments on a diverse infrastructure (laptop <-> ML rig <-> training cluster <-> production cluster)
  • Deploying and managing loosely-coupled microservices
  • Scaling based on demand

Because ML practitioners use so many different types of tools, it is a key goal that you can customize the stack to whatever your requirements (within reason), and let the system take care of the "boring stuff." While we have started with a narrow set of technologies, we are working with many different projects to include additional tooling.

Ultimately, we want to have a set of simple manifests that give you an easy to use ML stack anywhere Kubernetes is already running and can self configure based on the cluster it deploys into.

Setup

This documentation assumes you have a Kubernetes cluster already available. For specific Kubernetes installations, additional configuration may be necessary.

Minikube

Minikube is a tool that makes it easy to run Kubernetes locally. Minikube runs a single-node Kubernetes cluster inside a VM on your laptop for users looking to try out Kubernetes or develop with it day-to-day. The below steps apply to a minikube cluster - the latest version as of writing this documentation is 0.23.0. You must also have kubectl configured to access minikube.

Google Kubernetes Engine

Google Kubernetes Engine is a managed environment for deploying Kubernetes applications powered by Google Cloud. If you're using Google Kubernetes Engine, prior to creating the manifests, you must grant your own user the requisite RBAC role to create/edit other RBAC roles.

kubectl create clusterrolebinding default-admin --clusterrole=cluster-admin --user=user@gmail.com

Quick Start

Requirements

In order to quickly set up all components, execute the following commands,

# Initialize a ksonnet APP
APP_NAME=my-kubeflow
ks init ${APP_NAME}
cd ${APP_NAME}

# Install Kubeflow components
ks registry add kubeflow github.com/google/kubeflow/tree/master/kubeflow
ks pkg install kubeflow/core
ks pkg install kubeflow/tf-serving
ks pkg install kubeflow/tf-job

# Deploy Kubeflow
ks generate core kubeflow-core --name=kubeflow-core --namespace=${NAMESPACE}
ks apply default -c kubeflow-core

The above command sets up JupyterHub and a custom resource for running TensorFlow training jobs. Furthermore, the ksonnet packages provide prototypes that can be used to configure TensorFlow jobs and deploy TensorFlow models. Used together, these make it easy for a user go from training to serving using Tensorflow with minimal effort in a portable fashion between different environments.

For more detailed instructions about how to use Kubeflow please refer to the user guide

Resources

Get involved