You will need Pyspark SQL and Pyspark ML. The code should run with no issues using Python versions 3.*.
the goad is to predict churns based on user log data(a tiny subset (128MB) of the full dataset available (12GB)) from a music app.
The following are the files available in this repository:
Sparkify Project.ipynb
- a notebook of Exploratory Data Analysis,Feature Engineering and Modeling to predict churns, and which is exported intoSparkify Project.html
.
The main findings of the code can be found at the blog post available here.
Must give credit to the data from udacity,and thanks for all the instructions from udacity teams.