/Investigate-a-TMDb-Dataset-Movies

Investigating Movies Dataset contains information about 10,000+ movies collected from The Movie Database (TMDb)

Primary LanguageJupyter Notebook

Investigate-a-TMDb-Dataset-Movies

The primary goal of the project is to go through the dataset and the general data analysis process using numpy, pandas and matplotlib. This data set contains information about 10,000 movies collected from The Movie Database (TMDb), including user ratings and revenue.

Project Overview

In this project, you will analyze a dataset and then communicate your findings about it. You will use the Python libraries NumPy, pandas, and Matplotlib to make your analysis easier.

What do you need to install?

You will need an installation of Python, plus the following libraries:

  • pandas
  • NumPy
  • Matplotlib
  • csv

Why this Project?

In this project, you'll go through the data analysis process and see how everything fits together. Later Nanodegree projects will focus on individual pieces of the data analysis process.

You'll use the Python libraries NumPy, pandas, and Matplotlib, which make writing data analysis code in Python a lot easier! Not only that, these are sought-after skills by employers!

What will you learn?

After completing the project, you will:

  • Know all the steps involved in a typical data analysis process
  • Be comfortable posing questions that can be answered with a given dataset and then answering those questions
  • Know how to investigate problems in a dataset and wrangle the data into a format you can use
  • Have practice communicating the results of your analysis
  • Be able to use vectorized operations in NumPy and pandas to speed up your data analysis code
  • Be familiar with pandas' Series and DataFrame objects, which let you access your data more conveniently
  • Know how to use Matplotlib to produce plots showing your findings

Exploratory Data Analysis

Tip: Now that you've trimmed and cleaned your data, you're ready to move on to exploration. Compute statistics and create visualizations with the goal of addressing the research questions that you posed in the Introduction section. It is recommended that you be systematic with your approach. Look at one variable at a time, and then follow it up by looking at relationships between variables.

Research Questions

1.Which year Has the highest Profit Rate ?

2.Which Movie got the highest Ratings ?

3.What are the movies with the highest and the lowest revenues ?

4.Do movies with higher budgets receive higher ratings?

5.Movie with highest and lowest budget?

Conclusions

-.Highest Profit Year: The analysis revealed that the year 2015 had the highest profit rate in the dataset.

-.Highest Rated Movie: The movie titled "The Story of Film: An Odyssey" obtained the highest rating of 9.2.

-.Highest and Lowest Revenues: The movie "Avatar" had the highest revenue, while "Wild Card" had the lowest revenue.

-.Budget vs. Ratings: There was a weak positive correlation (correlation coefficient ≈ 0.08) between movie budgets and ratings.

-.Highest and Lowest Budget Movies: "The Warrior's Way" had the highest budget, while "Mr. Holmes" had the lowest budget.

-.Release Trends: The year 2014 saw the highest number of movie releases, while 1961 had the lowest.