Investigate-the-TMDb-Dataset

Overview

This project is a project requirement for Udacity's Data Analyst Nanodegree Course

The data set being investigated contains information about 10,000 movies collected from The Movie Database (TMDb). It includes cast, user ratings, budget and revenue. The purpose for this analysis is to investigate key aspects in the movie industry therefore answering some question that are of key interest to us.

Prerequisite

In order for you to run this project you need to have the following installed:

  • Python 3
  • Anaconda
  • Pandas
  • Matplotlib
  • Numpy
  • Seaborn
  • Question

    The project aims to investigate the following aspects of the movie industry.

    1. Which directors have directed the most movies? Top 20.
    2. Which actors/actress have acted the most movies? Top 20.
    3. Which genres have the highest release? Top 20.
    4. Which production companies have produced the most movies? Top 20.
    5. Which year has the highest movie release?
    6. Which movies have longest and shortest runtime?
    7. Which movies have the highest and lowest ratings?
    8. Which movies have the highest and lowest budget? Top 20.
    9. Which movies have the highest and lowest revenue? Top 20.
    10. Which movies have the highest profit and loss? Top 20.
    11. Which properties are associated with high profits and ratings.

    Resources

    Pandas Documentation

    Numpy Documentation

    Matplotlib Documentation

    Seaborn Documentation