End to End analytics project for analysis of Disney movies.
- Python
- Pandas,Numpy,bs4,requests,urllib,scipy,matplotlib
- Tableau Public
All the above packages and software are free!!
- Scraped Disney Movie data from Wikipedia here
- Collected additional data from OMDB API for every title and attached to the data.
- Cleaned data using Python and Regex
- Dropped around 20 columns as they all had almost 99% Null values
- Parsed out the main monetary values of different formats from Budget and Box-Office columns using several regexes together.
- Cleaned other columns like Running time and Imdb-rating and then converted their data-types to numeric.
-
Analyzed Trends, distributions in the data.
-
Performed some Descriptive and Basic inferential statistic on data (like building 95% CI's).
-
One important find was that mean running time of Disney movies lies between 95.37 minutes to 99.07 minutes 95% of the time.
-
Visuals from python were reproduced in Tableau to show in a Business setting with story, a few visuals are --