This project has the goal of demonstrating the end to end workflow of operationalizing a machine learning pipeline using azure ml studio.
The key steps of this project are defined below:
In order to work with the azure ml cli it is necessary to login to the azure accout and create a service principal that would have an owner role in the azure ml workspace.
az ad sp create-for-rbac --sdk-auth --name ml-auth
az ad sp show --id ac3638b4-3928-4a75-b4e5-131ec8887d04
az ml workspace share \
-w <your-workspace-name> \
-g <your-resurce-group-name> \
--user <objectId> \
--role owner
After we have authenticated we will define an automated ml experiment. For that we need to:
In order to have access to the data inside the azure ml studio a dataset must be registered. It is done in the dataset section of the ml studio. Azure ml studio offers different possibilities to register a dataset - from local files, local datastores, web files and even from open datasets. We will use From web files option:
To be able to trigger the pipeline from other CI/CD pipelines we need to have the pipeline published. Publishing the pipeline gives us a REST endpoint to interact with which has now the status of Ative:
When we click on the Pipeline runs tab we can see our submitted run:
For the purpose of this project the training pipeline will be fairly simple - it will contain only the dataset and the automl steps:
We will run the pipeline to generate the ml model:
The automl pipeline in particular does a number of runs to determine the best model architecture for us, all we have to do is to select the best run for further deployment:
This is the step that makes our model useful outside the studio and one step closer to the users.
To deploy a model from a run we need to go to the model tab:
Click the deploy button which will give us a published endpoint:
This step is a crucial step for any webservice that is intended for production. Logging offers insights and early warning signs that our service migh not be doing well. Logging is used as well for investigation of any incidents or failures.
Running the enable-ai.py script will enable logging for our model endpoint.
The nice thing about ml studio is that the published endpoint comes with the swagger.json which is a way to document APIs that is highly human and machine readable.
Check endpoint.py script to see an example of consuming the REST endpoint.
You can find a screen cast here: https://youtu.be/cCXFPUZrlMg
In order to make this project ready for prime time I would investigate and add model versioning. This is required to allow for fallback in case the new model is faulty. I would add as well a benchmarking step to the pipeline to be able to asses the quality of the new model.