Analyzing Influential Factors of a Successful Movie

Abstract

We use the official IMDb datasets extended with scraped content from the IMDb website to predict the IMDb rating based on features such as the length, budget, or genre of a movie. We use logistic regression for this purpose. Furthermore, we monitor the effect of COVID-19 on movie revenues in the form of a hypothesis test.

Downloading Data

Link to the data used
The dat folder should be copied into the root folder of the repository.
After that, src.preprocess.py can be run to do the preprocessing step on the data.

Usage

To recreate figures, use exp.fig_1.py, exp.fig_2.py, exp.fig_A1.py, and exp.fig_A2.py.
The model evaluation can also be done separately by running src.eval_models.py.

bmucsanyi/data_literacy_project

Analyzing Influential Factors of a Successful Movie

Abstract

Downloading Data

Usage