Brief Introduction of Kubeflow Pipeline

1. What is Kubeflow pipeline?

  • It is just like AML Studio experiment.

2. What is Kubeflow component?

  • A pipeline has many components, and a component is just like a step in AML Studio experiment.

  • A component has a Docker image (source codes) and an interface, which specifies the input/output.

3. How to write pipeline and component?

Step 1: Put source code into a Docker image. To do that, first install docker. Then write a Dockerfile. Build the docker image.

Step 2: Register Docker image to Dockerhub or GCR (Google Container Registry) or any other image repository you prefer.

Repeate Steps 1 and 2 for every component of pipeline, since every component needs a Docker image.

Step 3: Now the docker file of each component is ready to go, if everything goes smoothly. Then we need to write a yaml file as an intermediate representation of the pipeline. This yaml file defines the module interface, such as the input/output and the DAG of the pipeline.

  • The yaml file is generated by Kubeflow Pipeline SDK, to be more specific, the kfp.dsl package. BTW, DSL stands for domain-specific language.

  • This is a small python program to demonstrate how to use SDK to generate yaml file. Basically this program does two things: 1. Define a component interface and 2. define a pipeline by connecting all the components.

  • Then use the following command to generate the yaml file: demo.yaml.

    dsl-compile --py [path/to/python/file] --output demo.yaml

  • The previous example generates the pipeline yaml file from by the pipeline SDK. The problem is that there is no yaml file for each component. So that the component is hardly reusable. To solve this problem and encapsulate each component, it is preferred to write a yaml file for each component, and load each component yaml file into a pipeline yaml by the pipeline SDK. This is a tutorial of writing a component yaml file. This is an example.

Step 4: Now it is time to deploy the pipeline! To do that, simply upload the yaml file to Kubeflow UI, then you can see and run your pipeline.