Savadogo/Spark-Capstone-Project

HTML

Udacity Nanodegree big data Capstone Project--Sparkify

Table of Contents

Installation
Project Motivation
File Descriptions
Results
Licensing, Authors, and Acknowledgements

Installation

You will need Pyspark SQL and Pyspark ML. The code should run with no issues using Python versions 3.*.

Project Motivation

the goad is to predict churns based on user log data(a tiny subset (128MB) of the full dataset available (12GB)) from a music app.

File Descriptions

The following are the files available in this repository:

Sparkify Project.ipynb - a notebook of Exploratory Data Analysis,Feature Engineering and Modeling to predict churns, and which is exported into Sparkify Project.html.

Results

The main findings of the code can be found at the blog post available here.

Licensing, Authors, Acknowledgements

Must give credit to the data from udacity,and thanks for all the instructions from udacity teams.