/aml_sweep

Primary LanguagePythonMIT LicenseMIT

How to do hyperparameter tuning in Azure ML

This project shows how to train a Fashion MNIST model with, by doing hyperparameter tuning using an Azure ML sweep job, and how to deploy it using an online managed endpoint. It uses MLflow for tracking and model representation.

Azure setup

  • You need to have an Azure subscription. You can get a free subscription to try it out.
  • Create a resource group.
  • Create a new machine learning workspace by following the "Create the workspace" section of the documentation. Keep in mind that you'll be creating a "machine learning workspace" Azure resource, not a "workspace" Azure resource, which is entirely different!
  • If you have access to GitHub Codespaces, click on the "Code" button in this GitHub repo, select the "Codespaces" tab, and then click on "New codespace."
  • Alternatively, if you plan to use your local machine:
    • Install the Azure CLI by following the instructions in the documentation.
    • Install the ML extension to the Azure CLI by following the "Installation" section of the documentation.
  • In a terminal window, login to Azure by executing az login --use-device-code.
  • Set your default subscription by executing az account set -s "<YOUR_SUBSCRIPTION_NAME_OR_ID>". You can verify your default subscription by executing az account show, or by looking at ~/.azure/azureProfile.json.
  • Set your default resource group and workspace by executing az configure --defaults group="<YOUR_RESOURCE_GROUP>" workspace="<YOUR_WORKSPACE>". You can verify your defaults by executing az configure --list-defaults or by looking at ~/.azure/config.
  • You can now open the Azure Machine Learning studio, where you'll be able to see and manage all the machine learning resources we'll be creating.
  • Although not essential to run the code in this post, I highly recommend installing the Azure Machine Learning extension for VS Code.

Project setup

If you have access to GitHub Codespaces, click on the "Code" button in this GitHub repo, select the "Codespaces" tab, and then click on "New codespace."

Alternatively, you can set up your local machine using the following steps.

Install conda environment:

conda env create -f environment.yml

Activate conda environment:

conda activate aml_sweep

Training and inference on your development machine

  • Run train.py by pressing F5.
  • Analyze the metrics logged in the "mlruns" directory with the following command:
mlflow ui
  • Make a local prediction using the trained mlflow model. You can use either csv or json files:
cd aml_sweep
mlflow models predict --model-uri "model" --input-path "test_data/images.csv" --content-type csv
mlflow models predict --model-uri "model" --input-path "test_data/images.json" --content-type json

Training and deploying in the cloud

Create the compute cluster.

az ml compute create -f cloud/cluster-cpu.yml 

Create the dataset.

az ml data create -f cloud/data.yml 

Run the training job.

run_id=$(az ml job create -f cloud/sweep-job.yml --query name -o tsv)

Go to the Azure ML Studio and wait until the Job completes. Create the Azure ML model from the output.

az ml model create --name model-sweep --version 1 --path "azureml://jobs/$run_id/outputs/model_dir" --type mlflow_model

You don't need to download the trained model, but here's how you would do it if you wanted to:

az ml job download --name $run_id --output-name "model_dir"

Create the endpoint.

az ml online-endpoint create -f cloud/endpoint.yml
az ml online-deployment create -f cloud/deployment.yml --all-traffic

Invoke the endpoint.

az ml online-endpoint invoke --name endpoint-sweep --request-file test_data/images_azureml.json

Clean up the endpoint, to avoid getting charged.

az ml online-endpoint delete --name endpoint-sweep -y