"You give it the ball... it will score!" - Romario, a RESTful API for kick-starting Kubeflow Pipelines in your Kubernetes cluster.
The intent of romario is to enable large scale runs of KF Pipelines, across multiple Kubernetes clusters. KF Pipelines are a key way to "pythonically" orchestrate analytics pipelines, due to its straightforward abstraction of ARGO into a Python based DSL.
For the background on the scale required for Industrial applications of ML and AI please check this and this talk from Google Cloud NEXT 2019.
Romario provides a REST API, for executing the most usual operations performed by KF Pipelines.
A walkthrough of Romario is provided here: Doc
As any Depend-on-Docker project, building Romario is quite easy. Once the .env
environment file is set, it is enough to execute build.sh
. Please refer to the Depend-on-Docker Documentation for more information on how to customize Romario.
The deployment of romario assumes the existence of a Kubernetes cluster, with Kubeflow already deployed. The user must have access to the management node of such cluster, so that the deployment happens in the correct namespace (usualy 'kubeflow'). Please refer to this documentation on how to provision such infrastructure.
Once in the management node, romario gets deployed by simply running the deploy_romario_from_master.sh
script provided here.
All configurations required for the romario Kubernetes Service and Deployment are given here. Other nifty automation of usual kubectl
commands are provided in the same k8s folder.
Executing a simple pipeline is easy from the Master node of the underlying Kubernetes cluster. A sample curl -x POST ...
example is available in romario/Container-Root/test. The script takes a single argument of the .tar.gz
tarball. From the romario root:
./Container-Root/test/post_k8s_run_test.sh Container-Root/pipelines/SampleBasic-Condition.yaml.tar.gz
A Swagger-UI is available at https://<romario-endpoint>/apidocs .
Running a Jupyter Notebook server from the Master node on the cluster is simple with romario and Depend-on-Docker. Simply execute:
./run_jupyter.sh
This script will map the whole romario root to /wd
, allowing for the user to compile Pipelines manually from the master. A POST
request method to compile pipelines described in .py
scripts is under construction, stay tuned!
- Pipelines should be POST as
tar.gz
files, currently POSTingyaml
will result in an exception.
Romario wraps the Python DSL from Kubeflow Pipelines, providing minimal functionality through key endpoints - mostly kfp.client.run_pipeline() method. The opensource version of Romario is NOT intended to be an exhaustive production ready service.
The Pipelines team has been very supportive and we are very grateful.