Customer Churn Prediction with Azure Machine Learning:
From Kaggle Dataset to Productionalized Model
This repository contains an end-to-end ML lifecycle demo using the Azure Maching Learning Studio.
Different features of the Azure Machine Learning Studio will be shown while working to solve a Kaggle challenge to predict customer churn. The Kaggle challenge can be found here: https://www.kaggle.com/blastchar/telco-customer-churn.
Prerequisites to rebuild the demo:
- Azure subscription (with credits)
- Some foundational Azure and Data Science knowledge
Create a resource group.
Create an Azure Machine Learning workspace.
Enter your Azure Machine Learning workspace by clicking "Launch Now".
Create a compute instance in your Azure Maching Learning workspace.
Open Jupyter Notebooks in your compute instance.
Enter the terminal, switch directories to your user directory and clone this repository.
Download the two data files from the "data" folder to your local machine.
Create an Azure Data Lake Gen2. You do this by creating a storage account that has hierarchical namespace enabled.
Enable hierarchical namespace:
Create a container called "raw" in your Azure Data Lake Gen2.
Create two directories:
- 2020/03/31
- 2020/04/01
Upload the two data files in the respective directories in your Azure Data Lake Gen2 (according to the date).
Register your Azure Data Lake Gen2 as a storage account (not as an Azure Data Lake Gen2) datastore in the Azure Machine Learning Workspace.
First retrieve your storage account key from the Azure Portal:
Register a dataset using your datastore. Important: the dataset has to be named "customer-churn".
Install all necessary dependencies on your compute instance.
You can now run the notebooks. Specific explanations can be found as comments in the notebooks. You can omit running "03_customer_churn_train_decision_tree" and "04_customer_churn_train_automl" without affecting the downstream workflow.