
AML Databricks MLOps Workshop - Hands-on Lab 1

Step 0: Set up your repo

  • Fork the repo at: databricks-aml-mlops-workshop
  • Clone it to your local dev environment
  • Create a repo in your Azure DevOps
  • Push the cloned repo into the new location: git remote set-url origin <new path>

Step 1: Data lake structure

Set-up common data lake structure on the datalake

Step 2: Databricks

Create a Databricks cluster with runtime 10.4 LTS (includes Apache Spark 3.2.1, Scala 2.12) and ensure that these two libraries are installed:

  • azureml-core
  • azureml-mlflow

You will want to create a mount point from Databricks to the data lake using the utils/ example using access key for blob storage option (simplest option without any other dependencies).

You need to run the utils/ to prepare the dataset used in the training step

You will also want to set up Repo integration with Azure Repos and Databricks Repos:

Step 3: Spark analysis

Run the delta table and feature engineering / modeling scripts to

  • Create delta table and enable SQL-queries
  • Run feature engineering and modeling
  • Register model with Azure ML and mlflow

Step 4: Service Principle Secret and ADB Token

Service Principle:

ADB Token

  • In ADB Portal - Click on your email on top right panel
  • Select "User Settings"
  • Generate new token (save it somewhere as you can't see it again)

ADB KeyVault

  • Go to: https://<databricks-instance>#secrets/createScope
  • Set Scope name: key-vault-secret
  • Select All Users for Manae Principle
  • Paste DNS Name from your Key Vault Path

Step 4: Azure DevOps

See the example DevOps pipeline for how to create a model training and model deployment pipeline. Here is a really good example for using the Databricks APIs:

Project -> Pipelines -> Library:

  • Create a variable group named: mlops-workshop
  • Add the following items:
    • adb_host
    • adb_repo_id
    • adb_secrets_scope
    • adb_sp_secret_key
    • adb_token
    • azure_svc_name
    • cluster_id
    • location
    • resource_group
    • sp_id
    • sp_secret
    • sp_tenant_id
    • subscription_id
    • workspace_name

Retrieving the databricks values:

  • Add the project to your ADB repo

  • Install databricks-api locally: pip install databricks-api

  • Run utils/ locally:

    python utils/ --token <databricks-token> --databricks-host <http://....> --sp-secret-val <service preinciple secret value>

  • In the output of the find the list of repos and copy the 'id' of the target repo. The paste the value in the Azure DevOps Library

Set up Azure DevOps Pipeline

  • Import the repo into your the Azure DevOps
  • In the pipeline panel, select create pipeline
  • Select azure-pipeline-adb.yml from the master branch of the repo
  • Save and run the pipeline