/datascientist-capstone

Udacity DSND Capstone

Primary LanguageJupyter NotebookMIT LicenseMIT

datascientist_capstone

Udacity DSND Capstone

Table of Contents

  1. Installation
  2. Project Motivation
  3. File Descriptions
  4. Results
  5. Licensing, Authors, and Acknowledgements

Installation

This project uses Python 3, along with Jupyter Notebook. The following libraries are necessary for running the notebook:

  • Pandas
  • Numpy
  • MatplotLib
  • Seaborn
  • Scikit-Learn
  • pyspark with a correct Spark version installed

Project Motivation

For this project, It is needed to understand the Sparkify project files and how we can extract information about how to predict churn users for better business revenue.

File Descriptions

The main code for this project is included in the notebook sparkify.ipynb. The notebook walks through all the steps of the Process for analyzing the dataset to answer the question: can we predict the churn users?. The code and results are also posted on Medium as a blog post.

Data for the project is not included because of large file sizes. To properly run the notebook, it must be placed in data. The directory should have the following files:

  • mini_sparkify_event_data.json

Results

The main findings of the code can be found at the post available here.

Licensing, Authors, Acknowledgements

You can find the Licensing for the data and other descriptive information at the Kaggle link available here. This code is free to use. If so please refer a contact to @IsraelLlorens