This project is a project requirement for Udacity's Data Analyst Nanodegree Course
The data set being investigated contains information about 10,000 movies collected from The Movie Database (TMDb). It includes cast, user ratings, budget and revenue. The purpose for this analysis is to investigate key aspects in the movie industry therefore answering some question that are of key interest to us.
In order for you to run this project you need to have the following installed:
The project aims to investigate the following aspects of the movie industry.
- Which directors have directed the most movies? Top 20.
- Which actors/actress have acted the most movies? Top 20.
- Which genres have the highest release? Top 20.
- Which production companies have produced the most movies? Top 20.
- Which year has the highest movie release?
- Which movies have longest and shortest runtime?
- Which movies have the highest and lowest ratings?
- Which movies have the highest and lowest budget? Top 20.
- Which movies have the highest and lowest revenue? Top 20.
- Which movies have the highest profit and loss? Top 20.
- Which properties are associated with high profits and ratings.
Pandas Documentation
Numpy Documentation
Matplotlib Documentation
Seaborn Documentation