Optimizing an ML Pipeline in Azure

Overview

This project is part of the Udacity Azure ML Nanodegree. In this project, we build and optimize an Azure ML pipeline using the Python SDK and a provided Scikit-learn model. This model is then compared to an Azure AutoML run.

Useful Resources

Summary

This project examines a given marketing dataset from banking customers in order to create a model which emulates whether or not a particular customer is likely to respond to positively respond to a marketing campaign.

The best performing model, by accuracy, was generated by AutoML and leveraged a VotingEnsemble algorithm.

Scikit-learn Pipeline

  1. Create tabular data factory for the bank marketing dataset
  2. Preprocess and clean data. Feature engineering utilized data-splitting.
  3. Define random hyperparameter sampler for LogisticRegression with variables: regularization ('C') and maximum iterations ('max_iter')
  4. Define early-stop policy (BanditPolicy)
  5. Configure HyperDriveConfig to automate model generation

RandomParameterSampling was selected because it supports early-stopping. BanditPolicy was selected as the early-stoping methodology to abort runs which are not meeting desired accuracy threshholds/expectations, thus improving overall computational efficiency.

AutoML

  1. Create tabular data factory for the bank marketing dataset
  2. Preprocess and clean data with same methodology as scikit-learn pipeline
  3. Configure AutoML to automate model generation.

The AutoML pipeline was optimized for accuracy to be comparable with the Scikit-learn Pipeline and given a timeout of 30 minutes due to environment restrictons.

Pipeline comparison

The accuracy of the scikit-learn pipeline was 0.9091. The accuracy of the AuoML pipeline was 0.9188. Therefore, the AutoML pipeline out performed the Scikit-learn pipeline.

The AutoML pipeline identified a VotingEnsemble algorithm as the most accurate. VotingEnsemble considers previous autoML iterations to implement soft voting, wherein class predictions are determined from weighted averages. It is also interesting to note that the AutoML pipeline reported a balanced accuracy of 0.783. Disparate accuracy and balanced accuracy metrics often indicate imbalance in the dataset.

Future work

Given the imbalance of the dataset, it would be interesting to perform the experiment again optimizing for balanced_accuracy. The AutoML pipeline may be additionally improved given better resources/compute time, which in this experiment timed out at 30 minutes. VotingEnsemble would then have additional AutoML runs to consider in addition to more competiton from other algorithms which may better classify these data.

Proof of cluster clean up

Please see cluster clean up performed in notebook.