/azure-databricks-mlops-mlflow

Azure Databricks MLOps sample for Python based source code using MLflow without using MLflow Project.

Primary LanguageJupyter NotebookMIT LicenseMIT

page_type ms.custom ms.contributors languages products
sample
team=cse
prdeb-12/21/2021
anchugh-12/21/2021
python
azure-databricks
azure-blob-storage
azure-monitor

Azure Databricks MLOps using MLflow

This is a template or sample for MLOps for Python based source code in Azure Databricks using MLflow without using MLflow Project.

This template provides the following features:

  • A way to run Python based MLOps without using MLflow Project, but still using MLflow for managing the end-to-end machine learning lifecycle.
  • Sample of machine learning source code structure along with Unit Test cases
  • Sample of MLOps code structure along with Unit Test cases
  • Demo setup to try on users subscription

Problem Summary

Products/Technologies/Languages Used

  • Products & Technologies:
    • Azure Databricks
    • Azure Blob Storage
    • Azure Monitor
  • Languages:
    • Python

Architecture

Model Training

Model Training

Batch Scoring

Batch Scoring

Individual Components

  • ml_experiment - sample ML experiment notebook.
  • ml_data - dummy data for sample model
  • ml_ops - sample MLOps code along with Unit Test cases, orchestrator, deployment setup.
  • ml_source - sample ML code along with Unit Test cases
  • Makefile - for build, test in local environment
  • requirements.txt - python dependencies

Getting Started

Prerequisites

Development

  1. git clone https://github.com/Azure-Samples/azure-databricks-mlops-mlflow.git
  2. cd azure-databricks-mlops-mlflow
  3. Open cloned repository in Visual Studio Code Remote Container
  4. Open a terminal in Remote Container from Visual Studio Code
  5. make install to install sample packages (taxi_fares and taxi_fares_mlops) locally
  6. make test to Unit Test the code locally

Package

  1. make dist to build wheel Ml and MLOps packages (taxi_fares and taxi_fares_mlops) locally

Deployment

  1. make databricks-deploy-code to deploy Databricks Orchestrator Notebooks, ML and MLOps Python wheel packages. If any code changes.
  2. make databricks-deploy-jobs to deploy Databricks Jobs. If any changes in job specs.

Run training and batch scoring

  1. To trigger training, execute make run-taxi-fares-model-training
  2. To trigger batch scoring, execute make run-taxi-fares-batch-scoring

NOTE: for deployment and running the Databricks environment should be created first, for creating a demo environment the Demo chapter can be followed.

Observability

Check Logs, create alerts. etc. in Application Insights. Following are the few sample Kusto Query to check logs, traces, exception, etc.

  • Check for Error, Info, Debug Logs

    Kusto Query for checking general logs for a specific MLflow experiment, filtered by mlflow_experiment_id

      traces
    | extend mlflow_experiment_id = customDimensions.mlflow_experiment_id
    | where timestamp > ago(30m) 
    | where mlflow_experiment_id == <mlflow experiment id>
    | limit 1000

    Kusto Query for checking general logs for a specific Databricks job execution filtered by mlflow_experiment_id and mlflow_run_id

    traces
    | extend mlflow_run_id = customDimensions.mlflow_run_id
    | extend mlflow_experiment_id = customDimensions.mlflow_experiment_id
    | where timestamp > ago(30m) 
    | where mlflow_experiment_id == <mlflow experiment id>
    | where mlflow_run_id == "<mlflow run id>"
    | limit 1000
  • Check for Exceptions

    Kusto Query for checking exception log if any

    exceptions 
    | where timestamp > ago(30m)
    | limit 1000
  • Check for duration of different stages in MLOps

    Sample Kusto Query for checking duration of different stages in MLOps

    dependencies 
    | where timestamp > ago(30m) 
    | where cloud_RoleName == 'TaxiFares_Training'
    | limit 1000

To correlate dependencies, exceptions and traces, operation_Id can be used a filter to above Kusto Queries.

Demo

  1. Create Databricks workspace, a storage account (Azure Data Lake Storage Gen2) and Application Insights
    1. Create an Azure Account
    2. Deploy resources from custom ARM template
  2. Initialize Databricks (create cluster, base workspace, mlflow experiment, secret scope)
    1. Get Databricks CLI Host and Token
    2. Authenticate Databricks CLI make databricks-authenticate
    3. Execute make databricks-init
  3. Create Azure Data Lake Storage Gen2 Container and upload data
    1. Create Azure Data Lake Storage Gen2 Container named - taxifares
    2. Upload as blob taxi-fares data files into Azure Data Lake Storage Gen2 container named - taxifares
  4. Put secrets to Mount ADLS Gen2 Storage using Shared Access Key
    1. Get Azure Data Lake Storage Gen2 account name created in step 1
    2. Get Shared Key for Azure Data Lake Storage Gen2 account
    3. Execute make databricks-secrets-put to put secret in Databricks secret scope
  5. Put Application Insights Key as a secret in Databricks secret scope (optional)
    1. Get Application Insights Key created in step 1
    2. Execute make databricks-add-app-insights-key to put secret in Databricks secret scope
  6. Package and deploy into Databricks (Databricks Jobs, Orchestrator Notebooks, ML and MLOps Python wheel packages)
    1. Execute make deploy
  7. Run Databricks Jobs
    1. To trigger training, execute make run-taxifares-model-training
    2. To trigger batch scoring, execute make run-taxifares-batch-scoring
  8. Expected results
    1. Azure resources Azure resources
    2. Databricks jobs Databricks jobs
    3. Databricks mlflow experiment Databricks mlflow experiment
    4. Databricks mlflow model registry Databricks mlflow model registry
    5. Output of batch scoring Output of batch scoring

Additional Details

  1. Continuous Integration (CI) & Continuous Deployment (CD)
  2. Registered Models Stages and Transitioning

Related resources

  1. Azure Databricks
  2. MLflow
  3. MLflow Project
  4. Run MLflow Projects on Azure Databricks
  5. Databricks Widgets
  6. Databricks Notebook-scoped Python libraries
  7. Databricks CLI
  8. Azure Data Lake Storage Gen2
  9. Application Insights
  10. Kusto Query Language

Glossaries

  1. Application developer : It is a role that work mainly towards operationalize of machine learning.
  2. Data scientist : It is a role to perform the data science parts of the project

Contributors