/Operationalizing-ML

In this project we use Microsoft Azure to configure a cloud based Machine Learning model.

Primary LanguageJupyter NotebookMIT LicenseMIT

Operationalizing a Machine Learning Model In Azure:rocket:

Issues Pull Requests Forks Stars License  

The main objective of this project is to build a machine learning model using Azure Container Services.We have been provided with the banking dataset. The main steps of the project are:-

1) Authentication
2) Automated ML Experiment
3) Deploy the best model
4) Enable logging
5) Swagger Documentation
6) Consume model endpoints
7) Create and publish a pipeline
8) Documentation

Tech-Stacks:

Jupyter Python Shell Script Azure Docker Swagger

Architectural Diagram:clipboard:

An architectual diagram of the project and introduction of each step. alt text

Authentication

It is the vital step to ensure secure and authentic access. Authentication is required for the creation of the Service Principal account and associate it with specific workspace.

Automated ML Experiment

We create the new Automated ML Run Experiment and then upload the Bank Marketing dataset. We run the experiment configuring a new compute cluster, using Classification and ensure that the best model explaination is enabled.

Deploy the Best Model

After the completion of the AutoML Run, we will get our best model. We will then deploy that model using the Azure Container Instance(ACI) and enable Authentication to prevent unauthorized access.

Enable Logging

After the deployment, we will enable the Application Insights from the deployed model. This will help us produce logging output with the python sdk. It plays a vital role to debug problems in production environments.

Swagger Documentation

Swagger helps us build, document and consume RESTful web services. It also explains what type of requests API can consume like POST and GET.

Consume Model Endpoints

We must consume the deployed service to retrieve the data using HTTP request. It will help us in validation of data by identifying if anything is having any problem or is incorrect.

Create and Publish Pipelines

The last and the most vital step is to make the model publically available. This is done by creating a pipeline and then publishing it. It is synonymous to Automation as the pipeline create ways for other services to interact with it using HTTP endpoint.

Key Steps:tickets:

📌Register the Dataset

  • We have to first register the dataset from the local files.
    • Navigate to the Datasets section in the Workspace and create a new dataset from webfile and submit the URL required for the dataset alt text

📌Compute Instance

  • We have to build a compute instance of type DS12_V2 for running the AutoML Run.
    • Maximum number of nodes are 5 and min number of nodes are 5.

📌AutoML Run

  • We have to run an AutoML using the same registered Dataset.
  • We have to mention the same compute instance which we build earlier. alt text

📌Best Model

  • After running the AutoML we need to collect the best model from various diffrent models.
    • Here we got voting ensemble model which chooses voting model to choose the best of several runs. The base model is XGBOOST with Maxabs scaling and accuracy of 91%
    • After the experiment run completes, a summary of all the models and their metrics are shown, including explanations. Below images show the explaination of the best performing model.

alt text gssoc1 gssoc2

📌Endpoint Deployment

  • Once we Have the best model its time to deploy the model. We can use azure Kubernetes service or azure container instane for the deployment.
  • We need to choose authenticate method during the deployment method. Once deployment is succeded an endpoint will be created with status showing as healthy in workspace

📌Application Insights

  • Once the model is deployed we need to enable the logs setting the appinsights = True in the Experiment logging section by adding the experiment name.

  • To enable Application Insights, we run the logs.py file. gssoc3 gssoc4

  • Once we have enabled the logging we should see the status in application insights saying the failed requests, timed out requests etc.

alt text

📌Consume Endpoint (Swagger)

  • We can consume this endpoint using REST API or by running Azure ML python SDK's.

  • Swagger is one of the API tetsing platforms available .

  • Once the model is deployed we get a Swagger JSon file from the endpoint which needs to be downloaded and placed in the folder containing swagger files serve.py and swagger.sh.

  • After that we need to launch a local web server using serve script and lauch swagger using docker container by running swagger.sh

  • Here we are testing both using endpoint.py and Swagger gssoc5 alt text gssoc6

📌Creating and Publishing pipeline

We can schedule the pipelines using schdeule recurrence parameter reducing the manual efforts.

alt text

alt text

alt text

How to Contribute ?:heavy_plus_sign::spiral_notepad:

Check out our CONTRIBUTING GUIDELINES

See project in action HERE🖼️

✳️Standout Suggestions✳️

  • Collecting more data can definitely help in improving accuracy.
  • We can try testing the batch data in a schedule and see the performance.
  • We can try to implement various new algorithms along with running the AutoML Experiment for a longer time period.
  • We can try new values for number of nodes, concurrency etc.
  • We can use GPU's instead of CPU's to improve the performance. Since CPU's might reduce the costs but in terms of performance and accuracy GPU's outperform CPU's.

❤️ Thanks to our awesome contributors:technologist:✨.

LICENSE

This project is licensed under the MIT LICENSE