/customer-churn-prediction-azure-ml

This repository contains an end-to-end ML lifecycle demo using the Azure Maching Learning Studio.

Primary LanguageJupyter Notebook

Customer Churn Prediction with Azure Machine Learning:
From Kaggle Dataset to Productionalized Model

This repository contains an end-to-end ML lifecycle demo using the Azure Maching Learning Studio.

Different features of the Azure Machine Learning Studio will be shown while working to solve a Kaggle challenge to predict customer churn. The Kaggle challenge can be found here: https://www.kaggle.com/blastchar/telco-customer-churn.

Prerequisites to rebuild the demo:

  • Azure subscription (with credits)
  • Some foundational Azure and Data Science knowledge

Demo Instructions

Step 1:

Create a resource group.

create-resource-group

Step 2:

Create an Azure Machine Learning workspace.

create-azure-ml-workspace

Step 3:

Enter your Azure Machine Learning workspace by clicking "Launch Now".

launch-workspace

Step 4:

Create a compute instance in your Azure Maching Learning workspace.

create-compute-instance

Step 5:

Open Jupyter Notebooks in your compute instance.

open-jupyter

Step 6:

Enter the terminal, switch directories to your user directory and clone this repository.

enter-terminal

clone-git-repo

Step 7:

Download the two data files from the "data" folder to your local machine.

download-data

Step 8:

Create an Azure Data Lake Gen2. You do this by creating a storage account that has hierarchical namespace enabled.

create-data-lake-gen2-1

Enable hierarchical namespace:

create-data-lake-gen2-2

Create a container called "raw" in your Azure Data Lake Gen2.

create-container

Create two directories:

  • 2020/03/31
  • 2020/04/01

create-directory

Step 9:

Upload the two data files in the respective directories in your Azure Data Lake Gen2 (according to the date).

upload-data

Step 10:

Register your Azure Data Lake Gen2 as a storage account (not as an Azure Data Lake Gen2) datastore in the Azure Machine Learning Workspace.

First retrieve your storage account key from the Azure Portal:

get-storage-account-key

create-datastore

Step 11:

Register a dataset using your datastore. Important: the dataset has to be named "customer-churn".

create-dataset-1

create-dataset-2

create-dataset-3

create-dataset-4

Step 12:

Install all necessary dependencies on your compute instance.

install_dependencies

Step 13:

You can now run the notebooks. Specific explanations can be found as comments in the notebooks. You can omit running "03_customer_churn_train_decision_tree" and "04_customer_churn_train_automl" without affecting the downstream workflow.