This repository provides the supporting code for my presentation entitled Practical MLOps with GitHub and Azure ML.
This data comes from the Chicago Parking Ticket database, courtesy of Daniel Hutmacher. I sampled 1,000,000 records from it and the file I used is available in CSV format.
Import this into Azure ML using the Dataset name ChicagoParkingTicketsFolder
. Be sure to upload this as a uri_folder
instead of an MLtable
or uri_file
!
In order to run the ML pipeline notebooks and jobs locally, you will need to have the following installed on your machine:
- Python (preferably the Anaconda distribution), with
pip
installed:conda install -c anaconda pip
- The Azure CLI
- The Azure ML Azure CLI extension:
az extension add -n ml
- Pip packages:
pip install azure-ai-ml
,pip install azure-identity
- Visual Studio Code
- The Azure ML Visual Studio Code extension
Before you run the code, make sure your console has you logged into Azure via CLI:
az login
Then, create a folder called .azureml
and a file named config.json
. The file should look like the following structure:
{
"subscription_id": "YOUR SUBSCRIPTION ID",
"resource_group": "YOUR RESOURCE GROUP",
"workspace_name": "YOUR WORKSPACE NAME"
}
Note that you must be logged into az cli
with an account which has access to the subscription, resource group, and workspace.
From there, run the training code:
python deploy-train.py
You can see the job in action by going to Azure ML Studio and viewing the "Chicago_Parking_Tickets_Code-First" experiment. There will be a new "train_pipeline" job.
For scoring, run the following code:
python deploy-score.py
This will create a batch endpoint and deployment, upload data to a Datastore in Azure ML, create a job to generate predictions, and downloads the resulting predictions to a local file called predictions.csv
.
IMPORTANT NOTE -- You must explicitly grant rights to the account running deploy-score.py
against the Azure ML workspace. I granted Owner because I was running this personally, but it must be explicitly granted and not just have ownership as a side effect of subscription-level or resource group-level rights.
If you do not do this, you will likely get a strange BY_POLICY
error message when running this script.
Before we can use GitHub Action workflows to execute Azure Machine Learning pipelines, we need to grant appropriate permissions. The steps for this come from Microsoft Learn, specifically the options for OpenID Connect.
Follow the instructions in the Cheat Sheets folder, 02 - Application Security Configuration.txt. The individual commands to run are in 02b - az cli commands.txt. Note that this is NOT an automated script!
Each GitHub Action is in the .github/workflows folder. There are two workflows for training AML pipelines and two for scoring. The prior section on linking AML to GitHub must be completed before you can successfully run a pipeline.
For a review of what each workflow is doing, refer to the Cheat Sheets folder, specifically 03 - GitHub Actions Review.txt
Note that the GitHub Action workflow will kick off an Azure Machine Learning pipeline but it will not wait for that pipeline to complete, so in order to see if the pipeline run was successful, you will need to review those results in the Azure ML Studio or via CLI.