/film_industry

Simple Python scripts that fetches Hollywood movie data from different web sources (http://boxofficemojo.com and http://omdbapi.com) and writes them as a WEKA arff file (http://weka.wikispaces.com/ARFF).

Primary LanguagePythonBSD 3-Clause "New" or "Revised" LicenseBSD-3-Clause

film_industry

This repository contains a set of Python web crawler scripts that fetches Hollywood film distribution data from different sources (Box Office Mojo and The Open Movie Database), and put them together as a weka arff file.

These scripts were written for the final project I did for the data mining course I took back in 2015, during my MSc at London School of Economics. This project was slightly different from other movie-related data projects as it focused more on how decision variables regarding film production and distribution relates to commercial performance of film. Please mind very un-Pythonic syntaxes and silly code bits, as these were part of my early attempts at writing Python codes.

Unfortunately, please note that as of 23 October 2019, the codes here won't work anymore without major modifications as Box Office Mojo have completely changed their website design and moved much of previously public data behind paywall.

Report

Read our final project report from here.

License

film_industry is licensed under BSD-new license. Please check LICENSE.