Sparkify: Churn Prediction Project

For subscription-based companies, customer churn is one of the most important metrics to follow. It is defined as the amount of customers that stop using the company's services. It is important to understand why a customer churn to find the best ways to avoid it and to identify customers whose churn probability is high to take actions to retain it.

In this repo we will work with the fictitious streaming music company Sparkify, where, as in Spotify or Pandora, users can have a free account or a paid one. The main purpose is to predict when a customer will cancel it's subscription to know in advance to make prevent it.

Our job is to explore our user's data to predict when a user will churn to take actions to prevent it. For this we have to developed different supervised machine learning models. All of this using spark.

Main Files

The Notebook: Sparkify_Final.ipynb

Necessary Packages:

Python 3.6
Data Wrangling and cleaning libraries: PySpark, PySpark SQL, pandas, numpy
Data Visualization: matplotlib
ML library: PySparkML
Jupyter Lab

References:

https://www.udacity.com/course/data-scientist-nanodegree--nd025

Medium Article:

https://rdcastillo.medium.com/sparkify-churn-prediction-with-spark-e82bc87b738e

rudacaya/sparkify

Sparkify: Churn Prediction Project

Main Files

Necessary Packages:

References:

Medium Article: