This project shows how to train a Fashion MNIST model with an Azure ML job, and how to deploy it using an online managed endpoint. It uses MLflow for tracking and model representation.
- You need to have an Azure subscription. You can get a free subscription to try it out.
- Create a resource group.
- Create a new machine learning workspace by following the "Create the workspace" section of the documentation. Keep in mind that you'll be creating a "machine learning workspace" Azure resource, not a "workspace" Azure resource, which is entirely different!
- If you have access to GitHub Codespaces, click on the "Code" button in this GitHub repo, select the "Codespaces" tab, and then click on "New codespace."
- Alternatively, if you plan to use your local machine:
- Install the Azure CLI by following the instructions in the documentation.
- Install the ML extension to the Azure CLI by following the "Installation" section of the documentation.
- In a terminal window, login to Azure by executing
az login --use-device-code
. - Set your default subscription by executing
az account set -s "<YOUR_SUBSCRIPTION_NAME_OR_ID>"
. You can verify your default subscription by executingaz account show
, or by looking at~/.azure/azureProfile.json
. - Set your default resource group and workspace by executing
az configure --defaults group="<YOUR_RESOURCE_GROUP>" workspace="<YOUR_WORKSPACE>"
. You can verify your defaults by executingaz configure --list-defaults
or by looking at~/.azure/config
. - You can now open the Azure Machine Learning studio, where you'll be able to see and manage all the machine learning resources we'll be creating.
- Although not essential to run the code in this post, I highly recommend installing the Azure Machine Learning extension for VS Code.
If you have access to GitHub Codespaces, click on the "Code" button in this GitHub repo, select the "Codespaces" tab, and then click on "New codespace."
Alternatively, you can set up your local machine using the following steps.
Install conda environment:
conda env create -f environment.yml
Activate conda environment:
conda activate aml_command_artifact
- Run train.py by pressing F5. This saves the trained model as an MLflow artifact.
- Analyze the metrics logged in the "mlruns" directory with the following command:
mlflow ui
- Set the model_uri environment variable to the model you want to use in prediction. You can use the model created in the latest train run, which train.py prints. Or you can choose one from the mlflow UI. For example:
model_uri=runs:/4dff763fdab946eba83f469618544604/model_artifact
- Make a local prediction using the trained MLflow model. You can use either csv or json files:
mlflow models predict --model-uri $model_uri --input-path "aml_command_artifact/test_data/images.csv" --content-type csv
mlflow models predict --model-uri $model_uri --input-path "aml_command_artifact/test_data/images.json" --content-type json
cd aml_command_artifact
Create the compute cluster.
az ml compute create -f cloud/cluster-cpu.yml
Create the dataset.
az ml data create -f cloud/data.yml
Run the training job.
run_id=$(az ml job create -f cloud/job.yml --query name -o tsv)
Go to the Azure ML Studio and wait until the Job completes. You don't need to download the trained model, but here's how you would do it if you wanted to:
az ml job download --name $run_id
Create the Azure ML model from the trained model saved as an artifact.
az ml model create --name model-command-artifact --version 1 --path runs:/$run_id/model_artifact --type mlflow_model
Create the endpoint.
az ml online-endpoint create -f cloud/endpoint.yml
az ml online-deployment create -f cloud/deployment.yml --all-traffic
Invoke the endpoint.
az ml online-endpoint invoke --name endpoint-command-artifact --request-file test_data/images_azureml.json
When you're done, delete the endpoint to avoid getting charged.
az ml online-endpoint delete --name endpoint-command-artifact -y