/investigate-dataset

Udacity Data Analyst Nanodegree - Project #2

Primary LanguageHTML

Project: Investigate a Dataset - TMDB

TMDB databse is provided by Kaggle for those who are intrested in data analysis/science and machine learning. This database contains over 10,000 records about some movies, including infomration about their genres, casts, audience ratings, and revenues.

This project aims at analyzing this dataset in order to get better insights about movies. To be specific, we will try to answer the following questions using basic data analysis techniques, as well as Python Pandas, Numpy and Matplotlib libraries:

  1. Which genres are most popular from year to year?
  2. For how many years each of the most popular genres and least popular genres appeared in the ranking lists?
  3. What kinds of properties are associated with movies that have high revenues?

In order to achieve this goal, the dataset first needs to be cleaned. A good understanding of the provided dataset and the issues that come with it, as well as fixing these issues, is essential for a successful data analysis.

The implementation details, as well as our findings can be found here. Additionally, a .HTML version of this Jupyter notebook can be accessed here.