This project shows how to train a Fashion MNIST model with, by doing hyperparameter tuning using an Azure ML sweep job, and how to deploy it using an online managed endpoint. It uses MLflow for tracking and model representation.
- You need to have an Azure subscription. You can get a free subscription to try it out.
- Create a resource group.
- Create a new machine learning workspace by following the "Create the workspace" section of the documentation. Keep in mind that you'll be creating a "machine learning workspace" Azure resource, not a "workspace" Azure resource, which is entirely different!
- If you have access to GitHub Codespaces, click on the "Code" button in this GitHub repo, select the "Codespaces" tab, and then click on "New codespace."
- Alternatively, if you plan to use your local machine:
- Install the Azure CLI by following the instructions in the documentation.
- Install the ML extension to the Azure CLI by following the "Installation" section of the documentation.
- In a terminal window, login to Azure by executing
az login --use-device-code
. - Set your default subscription by executing
az account set -s "<YOUR_SUBSCRIPTION_NAME_OR_ID>"
. You can verify your default subscription by executingaz account show
, or by looking at~/.azure/azureProfile.json
. - Set your default resource group and workspace by executing
az configure --defaults group="<YOUR_RESOURCE_GROUP>" workspace="<YOUR_WORKSPACE>"
. You can verify your defaults by executingaz configure --list-defaults
or by looking at~/.azure/config
. - You can now open the Azure Machine Learning studio, where you'll be able to see and manage all the machine learning resources we'll be creating.
- Although not essential to run the code in this post, I highly recommend installing the Azure Machine Learning extension for VS Code.
If you have access to GitHub Codespaces, click on the "Code" button in this GitHub repo, select the "Codespaces" tab, and then click on "New codespace."
Alternatively, you can set up your local machine using the following steps.
Install conda environment:
conda env create -f environment.yml
Activate conda environment:
conda activate aml_sweep
- Run train.py by pressing F5.
- Analyze the metrics logged in the "mlruns" directory with the following command:
mlflow ui
- Make a local prediction using the trained mlflow model. You can use either csv or json files:
cd aml_sweep
mlflow models predict --model-uri "model" --input-path "test_data/images.csv" --content-type csv
mlflow models predict --model-uri "model" --input-path "test_data/images.json" --content-type json
Create the compute cluster.
az ml compute create -f cloud/cluster-cpu.yml
Create the dataset.
az ml data create -f cloud/data.yml
Run the training job.
run_id=$(az ml job create -f cloud/sweep-job.yml --query name -o tsv)
Go to the Azure ML Studio and wait until the Job completes. Create the Azure ML model from the output.
az ml model create --name model-sweep --version 1 --path "azureml://jobs/$run_id/outputs/model_dir" --type mlflow_model
You don't need to download the trained model, but here's how you would do it if you wanted to:
az ml job download --name $run_id --output-name "model_dir"
Create the endpoint.
az ml online-endpoint create -f cloud/endpoint.yml
az ml online-deployment create -f cloud/deployment.yml --all-traffic
Invoke the endpoint.
az ml online-endpoint invoke --name endpoint-sweep --request-file test_data/images_azureml.json
Clean up the endpoint, to avoid getting charged.
az ml online-endpoint delete --name endpoint-sweep -y