dsnd_sparkify

Data Science Nanodegree Capstone Project - Sparkify

Key Deliverables:

Sparkify.ipynb - Jupyter Notebook with technical data manipulation and analysis.
Sparkify_Blog_Post.ipynb - Jupyter Notebook for the blog post.
HTML view of blog post is here

Libraries Used

pyspark for data manipulation and machine learning
matplotlib and seaborn for data viz

Motivation

Selected this project as a learning opportunity to skill up on PySpark, a technology for scalable data science that is widely used in industry today.

Purpose:

This project seeks to use machine learning to predict customer churn for a hypothetical music streaming service called Spotify.

Summary:

Successfully completed a full end-to-end data preparation, modelling and optimisation exercise using PySpark. Gradient Boosted Trees emerged as the optimal model for predicting customer churn in this case.

stannnman/DSND-Sparkify

dsnd_sparkify

Key Deliverables:

Libraries Used

Motivation

Purpose:

Summary: