/romario

"You give it the ball... it will score!" - Romario, a RESTful API for kick-starting a KF-Pipeline in your Kubernetes cluster.

Primary LanguageShellApache License 2.0Apache-2.0

Romario v0.1.1 Logo

Romario

"You give it the ball... it will score!" - Romario, a RESTful API for kick-starting Kubeflow Pipelines in your Kubernetes cluster.

The intent of romario is to enable large scale runs of KF Pipelines, across multiple Kubernetes clusters. KF Pipelines are a key way to "pythonically" orchestrate analytics pipelines, due to its straightforward abstraction of ARGO into a Python based DSL.

For the background on the scale required for Industrial applications of ML and AI please check this and this talk from Google Cloud NEXT 2019.

Documentation

Romario provides a REST API, for executing the most usual operations performed by KF Pipelines.

A walkthrough of Romario is provided here: Doc

Building the Romario Project

As any Depend-on-Docker project, building Romario is quite easy. Once the .env environment file is set, it is enough to execute build.sh. Please refer to the Depend-on-Docker Documentation for more information on how to customize Romario.

Deployment to a Kubernetes cluster

The deployment of romario assumes the existence of a Kubernetes cluster, with Kubeflow already deployed. The user must have access to the management node of such cluster, so that the deployment happens in the correct namespace (usualy 'kubeflow'). Please refer to this documentation on how to provision such infrastructure.

Once in the management node, romario gets deployed by simply running the deploy_romario_from_master.sh script provided here.

All configurations required for the romario Kubernetes Service and Deployment are given here. Other nifty automation of usual kubectl commands are provided in the same k8s folder.

Running a Pipelines

Executing a simple pipeline is easy from the Master node of the underlying Kubernetes cluster. A sample curl -x POST ... example is available in romario/Container-Root/test. The script takes a single argument of the .tar.gz tarball. From the romario root:

./Container-Root/test/post_k8s_run_test.sh Container-Root/pipelines/SampleBasic-Condition.yaml.tar.gz

A Swagger-UI is available at https://<romario-endpoint>/apidocs .

Running romario image as a Jupyter Notebook server

Running a Jupyter Notebook server from the Master node on the cluster is simple with romario and Depend-on-Docker. Simply execute:

./run_jupyter.sh

This script will map the whole romario root to /wd, allowing for the user to compile Pipelines manually from the master. A POST request method to compile pipelines described in .py scripts is under construction, stay tuned!

Known Features, a.k.a Bugs

  1. Pipelines should be POST as tar.gz files, currently POSTing yaml will result in an exception.

Disclosures & Acknowledgments

Romario wraps the Python DSL from Kubeflow Pipelines, providing minimal functionality through key endpoints - mostly kfp.client.run_pipeline() method. The opensource version of Romario is NOT intended to be an exhaustive production ready service.

The Pipelines team has been very supportive and we are very grateful.