/Investigate_Dataset

Performance and Revenue Analysis: Investigate TMDb dataset

Primary LanguageHTML

Investigate_Dataset

Blog Version:

https://prvnirupama.wordpress.com/2017/12/06/directorial-lens-imdb-dataset/

Overview

  • This Project is an Exploratory Data Analysis conducted on the TMDb dataset (a subset of IMDb dataset on Kaggle) provided via UdacityData Analyst NanoDegree [DAND] Resources.
  • It contains 2 CSV files with the cleaned data used for the data analysis. Data cleaning steps performed are mentioned in the .pynb files
  • Investigate_a_Dataset and Investigate_a_Dataset_TMDb are the initial passes of EDA used to formulate the Questions to be answered by detailed investigation.
  • Investigate_a_Dataset_TMDb_NirupamaPV.pynb is the final submission for Udacity’s DAND third assignment.
  • Investigate_a_Dataset_TMDb_Directors_NirupamaPV.pynb is my own EDA to see the dataset with an angle of Directorial Influence over movies, ratings and revenues.
  • .pynb files contain Markdowns (Git flavor) to explain steps and rationale of the analysis performed

Research Qs

Udacity DAND Assignment : Investigate_a_Dataset_TMDb_NirupamaPV.pynb

  • Research Question 1 : How are runtimes, popularity and revenues trending over time?
  • Research Question 2 : What variables are associated with the revenues of movies spanning the years? If so, how?

Self Project: Investigate_a_Dataset_TMDb_Directors_NirupamaPV.pynb

  • Research Question 1 : Over the years, who are the popular directors?
  • Research Question 2 : What are typical runtimes for directors? Is there a duration preferred by directors?
  • Research Question 3 : What are typical revenues for directors? Who are the most successful directors?
  • Research Question 4 : Is there a relation between popularity and revenue for directors?