This project analyzes the potential opportunities for Microsoft to enter the film business. Descriptive analysis of Box Office information from the Internet Movie Database.
Our team has been put in charge of performing market analysis in order to explore what types of movies are doing the best at the box office and provide insights to Microsoft decision makers who want to make a move into the film space.
Our data has been scraped from the Internet Movie Database (IMDb) from the past decade including ratings, online vote counts, budgets, domestic & worldwide gross, and Directors. Any movies that did not record any box office figures were discarded.
In the folder zippedData are the baseline IMDb databases. Unique name and title id's from these files were used by the scrapers found in web_scrapers to compile a more complete dataset which can be found in scraped_data Our cleaned dataset can be found here
This project provides descriptive analysis of over 15,000 movies released in the last decade. It provides insights to stakeholders who are exploring entry into the film industry.
As a result of our analysis we found that the number of votes/reviews mattered most in relation to profits.
When measuring return-on-investment, we identified six primary genres to focus resources on. When outlier movies were removed from the dataset, these genres displayed the highest ROI.
Lastly, we identified markets outside of the United States that may warrant further study. See "Next Steps" below for more information.
Number of reviews can significantly impact profits, put time and energy into producing films that bring in reviews. Our findings indicate this number is more important than the average rating. Similar results have been found in this study. ROI highest among movies with:
- Crime
- Documentary
- Family
- Music
- Romance
- Thriller
The vast majority of movie spending takes place within the US, but there are less expensive places to make films and get a significant ROI. Cultural understanding and research into each region should be considered.
Based on the cumulative ROI figures, additional analysis may yield insights into less competitive markets and more profitable audiences.
To see out full analysis, please refer to our Jupyter Notebook
For additional info, contact brendanfrrs@gmail.com or david.bruce14@gmail.com
├── Countries_sorted_by_cumulative_ROI.png ├── README.md ├── data_cleaning.ipynb ├── images │ ├── Average_ROI_Per_Genre.png │ ├── Correlation_between_profit_and_numvotes.png │ ├── Countries_sorted_by_cumulative_ROI.png │ ├── RE2r0Th.png │ ├── movies_per_genre.png │ └── votes_profit_correlation.png ├── master_table.csv ├── microsoft_movie_analysis.ipynb ├── presentation.pdf ├── scraped_data │ ├── imdb_monetary_data_102849_.csv │ ├── imdb_monetary_data_16590.csv │ ├── imdb_monetary_data_16591_to_25214_.csv │ ├── imdb_monetary_data_29214_.csv │ ├── imdb_monetary_data_33736_.csv │ ├── imdb_monetary_data_49857_.csv │ ├── imdb_monetary_data_65930_.csv │ ├── imdb_monetary_data_67456_.csv │ ├── imdb_monetary_data_77436_.csv │ ├── imdb_name_list_13173_.csv │ ├── imdb_name_list_202_.csv │ ├── imdb_name_list_2268_.csv │ ├── imdb_name_list_3016_.csv │ ├── imdb_name_list_304_.csv │ ├── imdb_name_list_7799_.csv │ └── list_of_directors.csv ├── scripts.py ├── web_scrapers │ ├── financial_data_scraper_imdb.ipynb │ └── name_scraper_imdb.ipynb └── zippedData ├── imdb.name.basics.csv.gz ├── imdb.title.akas.csv.gz ├── imdb.title.basics.csv.gz ├── imdb.title.crew.csv.gz ├── imdb.title.principals.csv.gz └── imdb.title.ratings.csv.gz