This notebook documents an exploratory data wrangling effort I made using a CSV file made from data taken from The Movie Database. Unlike the more widely-known IMDb, which is owned by a subsidiary of Amazon.com, TMDb is user-owned, and its data can be freely accessed and used without incurring copyright infringement or censorship.
The information in the CSV file used for this project covers a range of films made from 1960 to 2015. My work consists of:
- Making initial observations of the data, and what information is missing, duplicated, or helpful/unhelpful to analysis;
- Cleaning the data, including adjustment of inaccurate numbers that might bias the results of analysis;
- Plotting and making observations on the frequency of film genres across the years;
- Plotting and making observations on the link between film budgets, revenue, and TMDb popularity score.
I hope you find it interesting and informative. Enjoy!