Introduction to kedro-mlflow
What are kedro and mlflow?
kedro
is a framework about building efficient data science pipelines in python and mlflow
is a library which manages the lifecycle of machine learning models.
Why a plugin?
mlflow
is extremely to use in a new project: you can just start logging the relevant objects inside your functions:
def my_func(param1, param2, data):
mlflow.log_param("param1", param1)
mlflow.log_param("param2", param2)
trained_model, metric =train_model(data)
mlflow.log_metric("auc", metric)
mlflow.log_model("my_model", trained_model)
This [breaks many of kedro principles and sofware engineering best practices](https://kedro-mlflow.readthedocs.The advantages of a plugin are:
- the logging is treated as I/O operations and is decoupled from the compute, so functions are reusable outside of a mlflow setup
- configuration is managed automagically (
kedro-mlflow
uses hooks inside the hood which are automatically registered, so installing the package "just work"). - each object as a clear place in the template which improves readability and reduces maintenance costs
- updates are easy: we just need to bump a dependency version
Motivation 1 : automatic tracking
Switch to kedro-mlflow-tutorial.
git clone https://github.com/Galileo-Galilei/kedro-mlflow-tutorial
cd kedro-mlflow-tutorial
conda create -n kedro_mlflow_tutorial python=3.9
conda activate kedro_mlflow_tutorial
pip install -e src
- Look a the configuration file
mlflow.yml
which was generated withkedro mlflow init
command. - Let's vizualize the pipeline first with
kedro viz
- look at
etl_instances
andetl_labels
codes. They emulate querying a "real" dataset, e.g. from a sql storage and create an instance dataset as a csv. - Run the pipelines with
kedro run -p etl_instances
andkedro run -p etl_labels
. - Open the mlflow UI:
kedro mlflow ui
- Parameters and tags have been created automatically 🎉 #
Motivation 2: advanced tracking
- Let's vizualize the pipeline first with
kedro viz
- look at
training
andinference
codes. They represent a training pipeline and the associated inference one: notice that it is extremely similar to scikit-learnfit
andtransform
💡. - Run the pipelines with
kedro run -p training
. - Open the mlflow UI:
kedro mlflow ui
- Notice that we logged metrics, a picture as artifact, and always more parameters (thanks to advanced configuration and catalog datasets, show the code!).
Motivation 3: a mlops framework
Switch entirely to kedro mlflow tutorial and showcase pipeline_ml_factory