SI699: Movie Revenue Prediction: Based on hierarchical regression model

Code for SI 699 course project: Using hierechical regression model to predcit the movie revenue based on data from IMDB public dataset. Here are the result and final report(poster).

Getting Started

Installing

1.To setup on your local machine: Install Anaconda with Python >= 3.6.

2.Clone the repository

git clone https://github.com/blacksingular/SI699.git

3.some package pre-installed:

pip install tqdm
pip install -U scikit-learn 

Update Data Set

we grabed 3259 movies metadata (2008 - 2018) from IMDB.com using OMDB API. If you would like to grab latest data. Run crawl.py

python crawl.py

Running the tests

To train the model as well as do some test. Run:

python improvement.py

Results

we evaluated our model on 3259 movies from 2008 to 2019 using Mean Absolute Rrror (MAE) and Symmetric Mean Absolute Percentage Error (SMAPE)

MAE ( $ M ) SNAPE
Train $ 7.1 M 0.451
Test $ 23.5 M 0.905

Authors

Team Lucy: Jiazhao Li, Yun Gao -- Dept. EECS of University of Michigan.

Acknowledgments

Thanks to project mentor: Prof. Qiaozhu Mei for inspiration and suggestions.