/Disney-Movie-Analysis

End to End analytics project for top disney movies

Primary LanguageJupyter Notebook

Disney-Movie-Analysis

End to End analytics project for analysis of Disney movies.

Important links

Prerequisite packages and software

  • Python
  • Pandas,Numpy,bs4,requests,urllib,scipy,matplotlib
  • Tableau Public

All the above packages and software are free!!

Data Collection

  • Scraped Disney Movie data from Wikipedia here
  • Collected additional data from OMDB API for every title and attached to the data.

Data Cleaning

  • Cleaned data using Python and Regex
  • Dropped around 20 columns as they all had almost 99% Null values
  • Parsed out the main monetary values of different formats from Budget and Box-Office columns using several regexes together.
  • Cleaned other columns like Running time and Imdb-rating and then converted their data-types to numeric.

Exploratory Data Analysis

  • Analyzed Trends, distributions in the data.

  • Performed some Descriptive and Basic inferential statistic on data (like building 95% CI's).

  • One important find was that mean running time of Disney movies lies between 95.37 minutes to 99.07 minutes 95% of the time.

  • Visuals from python were reproduced in Tableau to show in a Business setting with story, a few visuals are --

    BudVsBo

    IMDB

    TRENDS