We use the official IMDb datasets extended with scraped content from the IMDb website to predict the IMDb rating based on features such as the length, budget, or genre of a movie. We use logistic regression for this purpose. Furthermore, we monitor the effect of COVID-19 on movie revenues in the form of a hypothesis test.
- Link to the data used
- The
dat
folder should be copied into the root folder of the repository. - After that,
src.preprocess.py
can be run to do the preprocessing step on the data.
- To recreate figures, use
exp.fig_1.py
,exp.fig_2.py
,exp.fig_A1.py
, andexp.fig_A2.py
. - The model evaluation can also be done separately by running
src.eval_models.py
.