/tmdb

TMDB Box Office Revenue (Kaggle)

Primary LanguageJupyter Notebook

TMDB Box Office Prediction

: Kaggle competition predicting movie revenues


1. What movies make the most profit in the film industry?

My solution for the Kaggle competition, TMDB Box Office Prediction. The competition is predicting the box office revenues with the metadata of over 7,000 past films from TMDB Movie Database. The dataset contains various information such as title, original language and spoken languages, release year and running time, the list of crew and cast members, the production country and company, keywords and tagline.

page

The metric used is the root-mean-squared error and the accuracy of the first baseline model is 2.1347 by a Elastic Net model. The final submission public LB score 1.7249, Top 18% (242nd of 1400). The best single model I've built during the competition was a CatBoost model (max_depth = 9, learning_rate = .05). The final prediction has made by stacking 3 layers with residual weighted boosting technique and ensembling.


  • Project Date: Apr - May, 2019
  • Applied skills: Data Preprocessing and Manipulation, Scraping data with TMDB API, Intensive Exploratory Data Analysis, Feature engineering, Cross Validation, Residual Weighted Boosting, Stacking Models, and Ensemble Learning.

2. FlowChart

The whole process is as shown below.

page


3. File Details