In this project, we take a bank marketing data set and train a model to predict the likelihood of a customer to subscribe to a product. After this we deploy this model which exposes an http endpoint for authenticated users to consume and we create documentation using swagger. Finally we create and deploy a pipeline.
The data set contains 20 columns and 32,950 rows from results of a previous campaign. It contains details of customers such as age, education, marital status and so on.
Since a private Azure subscription was used for this, a service principal "ml-auth" was created for the purpose of authentication. Authentication without human intervention allows for more seamless operation in a production environment. The best model from the AutoML run was deployed and tested by sending sample requests. A pipeline was published and swagger documentation was created for the deployed model.
Authentication is done by way of a service principal. As earlier stated, using a service principal eliminates the need for human intervention thereby ensuring smoother operation of the application in production. Service principal named "ml-auth" was created via command line:
Service principal was assigned to the workspace for the experiment:
The bank marketing data set was uploaded for use in training the model via AutoML.
An AutoML classification experiment was created to run on compute cluster with Standard_DS12_V2 machines. Setting minimum number of nodes to 1 eliminates start up time of the machine when the experiment is created although it came at a cost as I was using a private Azure subscription.
The best model was a voting ensemble with accuracy of 0.91775.
The best model was deployed and application insights enabled. This was very useful for me later on in the project when I had to consume the endpoint and kept getting an http 502 response. I described this error a bit in the screen cast.
Metrics are available via application insights as shown below and they're very helpful for monitoring and troubleshooting
Logging is a powerful tool for a DevOps engineer. It helps with troubleshooting issues and with understanding the context of an application activity. I enabled logging as shown below:
A sample request was built using the endpoint.py file. I updated the scoring URI and application key which was obtained from the endpoints section in ML Studio.
I struggled a bit here as I was getting an http response 502. I had to drill down using application insights (in the right corner of the image above) to find out I had been sending input data with only 13 values as against the expected 20. I was using the endpoints.py data in the exercise starter files which wasn't the same as the endpoints.py data in starter files folder. I made this correction and got the expected json response.
Using swagger.sh and the swagger.json file from the deployed model I was able to set up swagger documentation for the endpoint to our deployed model.
Using the notebook provided, we created a pipeline that can be used to train the model. The cells were updated accordingly and pipeline run initiated
Run details widget shows pipeline run in progress:
Once completed, the pipeline was published. Image shows pipeline endpoint and active status.
View my video recording on Youtube here. Here I briefly run through what I have done in the project.