This repository contains a set of Python web crawler scripts that fetches Hollywood film distribution data from different sources (Box Office Mojo and The Open Movie Database), and put them together as a weka
arff
file.
These scripts were written for the final project I did for the data mining course I took back in 2015, during my MSc at London School of Economics. This project was slightly different from other movie-related data projects as it focused more on how decision variables regarding film production and distribution relates to commercial performance of film. Please mind very un-Pythonic syntaxes and silly code bits, as these were part of my early attempts at writing Python codes.
Unfortunately, please note that as of 23 October 2019, the codes here won't work anymore without major modifications as Box Office Mojo have completely changed their website design and moved much of previously public data behind paywall.
Read our final project report from here.
film_industry
is licensed under BSD-new license. Please check LICENSE
.