Code for SI 699 course project: Using hierechical regression model to predcit the movie revenue based on data from IMDB public dataset. Here are the result and final report(poster).
1.To setup on your local machine: Install Anaconda with Python >= 3.6.
2.Clone the repository
git clone https://github.com/blacksingular/SI699.git
3.some package pre-installed:
pip install tqdm
pip install -U scikit-learn
we grabed 3259 movies metadata (2008 - 2018) from IMDB.com using OMDB API. If you would like to grab latest data. Run crawl.py
python crawl.py
To train the model as well as do some test. Run:
python improvement.py
we evaluated our model on 3259 movies from 2008 to 2019 using Mean Absolute Rrror (MAE) and Symmetric Mean Absolute Percentage Error (SMAPE)
MAE ( $ M ) | SNAPE | |
---|---|---|
Train | $ 7.1 M | 0.451 |
Test | $ 23.5 M | 0.905 |
Team Lucy: Jiazhao Li, Yun Gao -- Dept. EECS of University of Michigan.
Thanks to project mentor: Prof. Qiaozhu Mei for inspiration and suggestions.