In this example we build a simple ML pipeline using argo worfklow. our example uses the popular iris dataset. The pipeline consist of 3 steps
-
Generate Data (preprocessing): this is the step where our data is feteched and split into train and test sets. ideally we can also do other things in this step like cleaning the data if it's not in a format we want
-
Build Model (Model training and evaluation): in this step we use our preprocessed data from the previous step to build a decision forest classifer. we then save the model as a pickle file and save the test accuracy of the model. ideally we would want split this step into 2 or more steps so we can train our model in a sperate step and test the saved model in a different step
-
Output Result (Result reporting): in this step we simply display the test accuracy of our model. we can potentially make other types of reports if we want. in this example things have deliberatly been keept very simple.
The yaml file used to generate the pipeline can be found in the pipeline folder.
The src folder holds all of our code we converted to docker images used by our pipeline.
To make our images easy to build and push, a base image is built with all the requirements every other step in our pipline can potentially need. if anything is ommited, we can easily run a pip install on top of our base image. so you would notice that everys step in our workflow uses the base image we built.
-
install docker-desktop and start the kuberntes cluster that comes with docker desktop.
-
install argo workflow following the instructions here
-
you can either port-forward the argo server or expose a loadbalancer(you would find this info in the link above for installing argo workflow) and then visit https://localhost:2746 (note that you may not find argo on http and so protocol has to be https)
-
you can then click on the submit new workflow button and click on edit using full workflow option
-
you can then paste the content of the argo-iris.yaml into the UI and submit the workflow and wait for it to finish running.
feel free to ask via the github issues or shoot me a message on linkedin