/aws-do-kubeflow

A do-framework project to simplify deployment of Kubeflow on AWS

Primary LanguageJupyter NotebookMIT No AttributionMIT-0

aws-do-kubeflow

AWS Do Kubeflow - Deploy and Manage Kubeflow on AWS EKS

Overview

Kubeflow is an open source project which deploys on Kubernetes. It provides end-to-end ML platform and workflow capabilities. There are a number of ways to deploy Kubeflow as well as many variations of Kubeflow that can be deployed. The goal of aws-do-kubeflow is to simplify the deployment and management of Kubeflow on AWS as well as provide some useful ML examples. This project follows the principles of the Do Framework and the structure of the Depend on Docker template. It containerizes all the tools necessary to deploy and manage Kubeflow using Docker, then executes the deployment from within the container. All you need is an AWS Account.

For a hands-on experience with Kubeflow and its application for distributed ML training workflows, please see our online workshop and walk through the self-paced workshop steps.

Below is an overview diagram that shows the general architecture of a Kubeflow deployment on EKS.


Fig.1 - Deployment Architecture

The deployment process is described on Fig. 2 below:


Fig.2 - Kubeflow deployment process with aws-do-kubeflow

Prerequisites

  1. AWS Account - you will need an AWS account
  2. EKS Cluster - it is assumed that an EKS cluster already exists in the account. If a cluster is needed, one way to create it, is by following the instructions in the aws-do-eks project. In that case it is recommended to use cluser manifest /eks/eks-kubeflow.yaml, located within the aws-do-eks conatiner.
  3. Optionally, we recommend using AWS Cloud9 as a working environment. Instructions for setting up a Cloud9 IDE are available here

Configure

All configuration settings of the aws-do-kubeflow project are centralized in its .env file. To review or change any of the settings, simply execute ./config.sh. The AWS_CLUSTER_NAME setting must match the name of your existing EKS Cluster, and AWS_REGION should match the AWS Region where the cluster is deployed.

The aws-do-kubeflow project supports both the generic and AWS specific Kubeflow distributions. Your desired distribution to deploy, can be configured via setting KF_DISTRO. By default, the project deploys the AWS vanilla distribution.

Build

Please execute the ./build.sh script to build the project. This will create the "aws-do-kubeflow" container image and tag it using the registry and version tag specified in the project configuration.

Run

Execute ./run.sh to bring up the Docker container.

Status

To check if the container is up, execute ./status.sh. If the container is in Exited state, it can be started with ./start.sh

Exec

Executing the ./exec.sh script will open a bash shell inside the aws-do-kubeflow container.

Deploy Kubeflow

To deploy your configured distribution of Kubeflow, simply execute ./kubeflow-deploy.sh

The deployment creates several groups of pods in your EKS cluster. Upon successful deployment, all pods will be in Running state. To check the state of all pods in the cluster, use command: kubectl get pods -A.

Access Kubeflow Dashboard

In order to access the Kubeflow Dashboard, the Istio Ingress Gateway service of this Kubeflow deployment needs to be exposed outside the cluster. In a production deployment this is typically done via an Application Load Balancer (ALB), however this requires a DNS domain registration and a matching SSL certificate.

For an easy way to expose the Kubeflow Dashboard, we can use kubectl port-forward from Cloud9 or from any machine that has a browser and kubectl access to the cluster. To start the port-forward, execute script ./kubeflow-expose.sh.

If you are in Cloud9, select Preview->Preview Running Application. This will open a browser tab within Cloud9. You can expand that tab to a full-browser by clicking the icon in the upper-right corner.

If you are on a machine with its own browser, just navigate to localhost:8080 to open the Kubeflow Dashboard.


Fig. 3 - Kubeflow Dashboard

Remove Kubeflow Deployment

To remove your Kubeflow deployment, simply execute ./kubeflow-remove.sh from within the aws-do-kubeflow container.

Command reference

  • ./config.sh - configure aws-do-kubeflow project settings interactively
  • ./build.sh - build aws-do-kubeflow container image
  • ./login.sh - login to the configred container registry
  • ./push.sh - push aws-do-kubeflow container image to configured registry
  • ./pull.sh - pull aws-do-kubeflow container image from a configured existing registry
  • ./prune.sh - delete all unused docker containers, networks and images from the local host
  • ./run.sh - run aws-do-kubeflow container
  • ./status.sh - show current aws-do-kubeflow container status
  • ./logs.sh - show logs of the running aws-do-kubeflow container
  • ./start.sh - start the aws-do-kubeflow container if is currently in "Exited" status
  • ./exec.sh - execute a command inside the running aws-do-kubeflow container, the default command is bash
  • ./stop.sh - stop and remove the aws-do-kubeflow container
  • ./test.sh - run container unit tests

Security

See CONTRIBUTING for more information.

License

This library is licensed under the MIT-0 License. See the LICENSE file.

Troubleshooting

  • Cloud9 instance running out of disk space - refer to instructions for increasing of volume size here

  • Errors regarding your permissions as a user in Cloud9 - refer to Create an IAM role for your Workspace.

  • Namespaces are left in Terminating state when removing a Kubeflow deployment - execute script ./configure/ns-clear.sh

Credits

  • Mark Vinciguerra - @mvincig
  • Jason Dang - @jndang
  • Tatsuo Azeyanagi - @tazeyana
  • Alex Iankoulski - @iankouls
  • Kanwaljit Khurmi - @kkhurmi
  • Milena Boytchef - @boytchef
  • Gautam Kumar - @gauta

References