
Primary LanguageJupyter Notebook

Data Exploration and Summary

  • Display basic statistics of the dataset (e.g., mean, median, max, min).
  • Count the number of movies in each genre.
  • Explore the distribution of movie ratings.
  • Check the distribution of movie release years.

Missing Values Handling

  • Handle missing values in 'MetaScore', 'Gross', and 'Certification'.
  • Decide on a strategy to impute or drop rows with missing values.

Data Visualization

  • Create a histogram to visualize the distribution of movie ratings.
  • Plot a bar chart for the count of movies in each genre.
  • Generate a box plot for 'Gross' to identify outliers.

Genre Analysis

  • Identify the most common genre.
  • Analyze the relationship between movie ratings and genres.
  • Explore the distribution of movie run times for each genre.

Director and Star Analysis

  • Find the most prolific directors.
  • Identify popular stars based on the number of movies.
  • Analyze the relationship between director and movie ratings.

Year of Release Analysis

  • Explore the trend of movie releases over the years.
  • Compare the average ratings of movies in different decades.
  • Investigate the relationship between the year of release and gross earnings.

Certification Analysis

  • Analyze the distribution of certifications.
  • Compare movie ratings based on certification.

Correlation Analysis

  • Investigate correlations between 'Movie Rating', 'MetaScore', 'Gross', and other numeric columns.

Description Analysis

  • Perform sentiment analysis on movie descriptions.
  • Visualize the most common words used in movie descriptions.

Advanced Analysis

  • Build a predictive model to estimate movie ratings based on available features.
  • Create a recommendation system based on movie ratings and genres.

Data Exploration and Summary: