/Predicting_User_Cancelation_with_Spark

Data manipulation and Machine Learning with Spark to engineer relevant features for predicting which users will cancel their music streaming service.

Primary LanguageJupyter Notebook

Predicting Users Who Canceled Their Service with Spark

This repository contains Data manipulation with Spark to engineer relevant features for predicting which users will cancel their music streaming service.

Instalations needed to run the code

  • Python 3
  • Pandas
  • NumPy
  • Matplotlib 3
  • Seaborn
  • PySpark

Files

Predicting-Churn-Users.ipynb

This notebook contains the code used for the entire exploratory analysis process and the ML model selection and tunning.

Data

Every time the user interacts with the service it generates data that contains the key insights for keeping users happy and helping the business thrive.

data

Exploratory Data Analysis

drawing

drawing

Modeling Process

For fitting the data, I tried a RandomForestClassifier, DecisionTreeClassifier, and LogisticRegression. I prioritized the model that performed best at the f1 score since the dataset is imbalanced (Few Churned users). The best model was a RandomForestClassifier tunned with various hyperparameters and features.

Model Results

drawing

drawing