/sparkify

Primary LanguageJupyter Notebook

Sparkify: Churn Prediction Project

For subscription-based companies, customer churn is one of the most important metrics to follow. It is defined as the amount of customers that stop using the company's services. It is important to understand why a customer churn to find the best ways to avoid it and to identify customers whose churn probability is high to take actions to retain it. 

In this repo we will work with the fictitious streaming music company Sparkify, where, as in Spotify or Pandora, users can have a free account or a paid one. The main purpose is to predict when a customer will cancel it's subscription to know in advance to make prevent it.

Our job is to explore our user's data to predict when a user will churn to take actions to prevent it. For this we have to developed different supervised machine learning models. All of this using spark.

Main Files

  • The Notebook: Sparkify_Final.ipynb

Necessary Packages:

  • Python 3.6
  • Data Wrangling and cleaning libraries: PySpark, PySpark SQL, pandas, numpy
  • Data Visualization: matplotlib
  • ML library: PySparkML
  • Jupyter Lab

References:

https://www.udacity.com/course/data-scientist-nanodegree--nd025

Medium Article:

https://rdcastillo.medium.com/sparkify-churn-prediction-with-spark-e82bc87b738e