/mlflow-apps

MLflow App Library

Primary LanguagePythonApache License 2.0Apache-2.0

MLflow App Library

Collection of pluggable MLflow apps (MLflow projects). You can call the apps in this repository to:

  • Seamlessly embed ML functionality into your own applications
  • Reproducibly train models from a variety of frameworks on big & small data, without worrying about installing dependencies

We recommend calling the apps in this library from a Python 3 environment - the apps run in Python 3 conda environments, so it may not be possible to load the models produced by the apps back into Python 2 environments.

Getting Started

Running Apps via the CLI

Let’s start by running the gbt-regression app, which trains an XGBoost Gradient Boosted Tree model.

First, download example training & test parquet files containing the diamonds:

temp="$(mktemp -d)"
mlflow run https://github.com/schipiga/mlflow-apps.git -P dest-dir=$temp

Then, train a GBT model and save it as an MLflow model (see the GBT App docs for more information):

mlflow run https://github.com/schipiga/mlflow-apps.git#apps/gbt-regression/ -P train="$temp/train_diamonds.parquet" -P test="$temp/test_diamonds.parquet" -P label-col="price"

The output will contain a line with the run ID, e.g:

Run with ID <run id> finished

We can now use the fitted model to predict on our test data (substitute in the run ID from the previous step):

mlflow pyfunc predict -m model -r <run id> -i "$temp/diamonds.csv"

The output of this command will be 20 numbers, which are predictions of 20 diamonds’ prices based on their features (located in $temp/diamonds.csv). You can compare these numbers to the actual prices of the diamonds, which are viewable via

cat $temp/diamond_prices.csv

Finally, clean up the generated files via:

rm -r $temp

Calling an App in Your Code

Calling an app from your code is simple - just use MLflow’s Python API:

# Train an XGBoost GBT, exporting it as an MLflow model
train_data_path = "..."
test_data_path = "..."
label_col = "..."
# Running the MLflow project
submitted_run = mlflow.projects.run(uri="https://github.com/schipiga/mlflow-apps.git#apps/gbt-regression/", parameters={"train":train_data_path, "test":test_data_path, "label-col":label_col})
# Load the model again for inference or more training
model = mlflow.sklearn.load_model("model", submitted_run.run_id)

Apps

The library contains the following apps:

dnn-regression

This app creates and fits a TensorFlow DNNRegressor model based on parquet-formatted input data. Then, the application exports the model to a local file and logs the model using MLflow’s APIs. See more info here.

gbt-regression

This app creates and fits an XGBoost Gradient Boosted Tree model based on parquet-formatted input data. See more info here.

linear-regression

This app creates and fits an Elastic Net model based on parquet-formatted input data. See more info here.

Contributing

If you would like to contribute to this library, please see the contribution guide for details.