MLflow Pipelines Regression Template

This repository serves as a customizable template for the MLflow Regression Pipeline to develop high-quality production-ready regression models.

Currently supported ML models are limited to scikit-learn and frameworks that integrate with scikit-learn, such as the XGBRegressor API from XGBoost.

Note: MLflow Pipelines is an experimental feature in MLflow. If you observe any issues, please report them here. For suggestions on improvements, please file a discussion topic here. Your contribution to MLflow Pipelines is greatly appreciated by the community!

Installation instructions

(Optional) Create a clean Python environment either via virtualenv or conda for the best experience. Python 3.7 or higher is required.

Install the latest MLflow with Pipelines:

pip install mlflow[pipelines]

Clone this MLflow Regression Pipeline template repository locally:

git clone https://github.com/mlflow/mlp-regression-template.git

Enter the root directory of the cloned pipeline template:

cd mlp-regression-template

Install the template dependencies:

pip install -r requirements.txt

Log to the designated MLflow Experiment

To log pipeline runs to a particular MLflow experiment:

Open profiles/databricks.yaml or profiles/local.yaml, depending on your environment.
Edit (and uncomment, if necessary) the experiment section, specifying the name of the desired experiment for logging.

Development Environment -- Databricks

Sync this repository with Databricks Repos and run the notebooks/databricks notebook on a Databricks Cluster running version 11.0 or greater of the Databricks Runtime or the Databricks Runtime for Machine Learning with workspace files support enabled.

Note: When making changes to pipelines on Databricks, it is recommended that you either edit files on your local machine and use dbx to sync them to Databricks Repos, as demonstrated here, or edit files in Databricks Repos by opening separate browser tabs for each YAML file or Python code module that you wish to modify.

For the latter approach, we recommend opening at least 3 browser tabs to facilitate easier development:

One tab for modifying configurations in pipeline.yaml and / or profiles/{profile}.yaml
One tab for modifying step function(s) defined in steps/{step}.py
One tab for modifying and running the driver notebook (notebooks/databricks)

Accessing MLflow Pipeline Runs

You can find MLflow Experiments and MLflow Runs created by the pipeline on the Databricks ML Experiments page.

Development Environment -- Local machine

Jupyter

Launch the Jupyter Notebook environment via the jupyter notebook command.
Open and run the notebooks/jupyter.ipynb notebook in the Jupyter environment.

Command-Line Interface (CLI)

First, enter the template root directory and set the profile via environment variable

cd mlp-regression-template

export MLFLOW_PIPELINES_PROFILE=local

Then, try running the following MLflow Pipelines CLI commands to get started. Note that the --step argument is optional. Pipeline commands without a --step specified act on the entire pipeline instead.

Available step names are: ingest, split, transform, train, evaluate and register.

Display the help message:

mlflow pipelines --help

Run a pipeline step or the entire pipeline:

mlflow pipelines run --step step_name

Inspect a step card or the pipeline dependency graph:

mlflow pipelines inspect --step step_name

Clean a step cache or all step caches:

mlflow pipelines clean --step step_name

Note: a short cut to mlflow pipelines is installed as mlp. For example, to run the ingest step, instead of issuing mlflow pipelines run --step ingest, you may type

mlp -s ingest

Accessing MLflow Pipeline Runs

To view MLflow Experiments and MLflow Runs created by the pipeline:

Enter the template root directory: cd mlp-regression-template
Start the MLflow UI

mlflow ui \
   --backend-store-uri sqlite:///metadata/mlflow/mlruns.db \
   --default-artifact-root ./metadata/mlflow/mlartifacts \
   --host localhost

Open a browser tab pointing to http://127.0.0.1:5000

florent-brosse/ml-pipelines

MLflow Pipelines Regression Template

Installation instructions

Log to the designated MLflow Experiment

Development Environment -- Databricks

Accessing MLflow Pipeline Runs

Development Environment -- Local machine

Jupyter

Command-Line Interface (CLI)

Accessing MLflow Pipeline Runs