/movie-database-investigation

A study and visual examination of data on films taken from The Movie Database

Primary LanguageJupyter Notebook

Movie Database Investigation

This notebook documents an exploratory data wrangling effort I made using a CSV file made from data taken from The Movie Database. Unlike the more widely-known IMDb, which is owned by a subsidiary of Amazon.com, TMDb is user-owned, and its data can be freely accessed and used without incurring copyright infringement or censorship.

The information in the CSV file used for this project covers a range of films made from 1960 to 2015. My work consists of:

  • Making initial observations of the data, and what information is missing, duplicated, or helpful/unhelpful to analysis;
  • Cleaning the data, including adjustment of inaccurate numbers that might bias the results of analysis;
  • Plotting and making observations on the frequency of film genres across the years;
  • Plotting and making observations on the link between film budgets, revenue, and TMDb popularity score.

I hope you find it interesting and informative. Enjoy!