/sparkify_customer_churn

This project analyzes and predicts customer churn of a music streaming service using Spark on a large dataset.

Primary LanguageJupyter Notebook

Sparkify: Customer Churn Prediction of a Music Streaming Service Using Spark

This project analyzes and predicts customer churn of a music streaming service using Spark on a large dataset.

I took the starter code for this repository from a Udacity assignment project and modified it to the present form, which deviates significantly from the original form; see starter.

The project focuses on an imaginary music streaming service, similar to Spotify, where users can listen to streamed music. In that service:

  • We have: (1) free-tier users and (2) premium users who pay a subscription.
  • Every time an user is involved in an event, it is logged with a timestamp; example events: songplay, logout, like, ad_heard, downgrade, etc.

The goal is to predict customer churn, either (1) as a downgrade from the premium to free plan or (2) in form of a user leaving the service. With churn predictions, the company can target those users with incentives, such as discounts, etc.

🚧 On-going work.

Contents:

  • A
  • B
  • ...

Table of Contents

Dataset

🚧 TBD.

How to Use This Project

🚧 TBD.

The directory of the project consists of the following files:

.
├── Instructions.md
...

Installing Dependencies for Custom Environments

If you already have a Python environment with the usual ML libraries and you'd like to add PySpark:

# Install PySpark manually
python -m pip install pyspark
python -m pip install findspark

Alternatively, if you want to create a new Python environment (recommended), you can do it with conda:

# Create an environment
conda create -n sparkify python=3.9 pip
conda activate sparkify

# Install pip-tools
python -m pip install -U pip-tools

# Generate pinned requirements.txt
# PySpark is listed there
pip-compile requirements.in

# Install pinned requirements, as always
python -m pip install -r requirements.txt

# If required, add new dependencies to requirements.in and sync
# i.e., update environment
pip-compile requirements.in
pip-sync requirements.txt
python -m pip install -r requirements.txt

# To track any changes and versions you have
conda env export > conda.yaml
pip list --format=freeze > requirements.txt

# To delete the conda environment, if required
conda remove --name sparkify --all

Notes on the Theory

🚧 TBD.

Notes on the Implemented Analysis and Modeling

🚧 TBD.

Results and Conclusions

🚧 TBD.

Next Steps, Improvements

🚧 TBD.

References and Links

🚧 TBD.

Authorship

Mikel Sagardia, 2023.
No guarantees.

If you find this repository useful, you're free to use it, but please link back to the original source.