sparkml-regression-models-movie-revenue-predictions: A Jupyter Notebook repository from Jayasagar

Background

This dataset is inspired from below kaggle, as it simplified version of movie dataset.

https://www.kaggle.com/kevinmariogerard/tmdb-movie-dataset/data

During my research and implementation I took a few variables into account which could influence the revenue of a movie. The factors that I analysed and elaborated in this notebook are:

popularity budget runtime genres vote_count vote_average release_year revenue

Setup

Note: Command only works for MAC or Linux, please check for Windows if you have

Read through the link for help { How to install Spark and start PySpark to run Notebook https://blog.sicara.com/get-started-pyspark-jupyter-guide-tutorial-ae2fe84f594f }

Install Python 3
Install Java 8 or higher installed on your computer.
Install Spark Downloaded the latest version Spark from http://spark.apache.org/downloads.html Version: spark-2.3.0-bin-hadoop2.7
Install Jupyter Notebook

How to RUN

Configure following 3 System environment variables { export PYSPARK_PYTHON=python3 export PYSPARK_DRIVER_PYTHON=jupyter export PYSPARK_DRIVER_PYTHON_OPTS='notebook' }
In the command prompt terminal
1. Go to Spark installed location example: "/Users/software/spark-2.3.0-bin-hadoop2.7/bin"
2. Run ./pyspark or python.cmd (depending on OS)
Now you see Jupyter Home page in browser
Now, Open the shared .ipynb files. Example: TMDbRegressionModels.ipynb
Change "path =" to your location of the dataset i.e. "tmdb-movies-final-features-no-header.csv"

Finally, you can run through each Cell in the Jupyter

Jayasagar/sparkml-regression-models-movie-revenue-predictions

Background

Setup

How to RUN