/IMDB-Data-Scraping-And-Analysis

Given a list of 4000 movies, my task was to collect the data about these movies and do some further data analysis.

Primary LanguageJupyter NotebookMIT LicenseMIT

IMDB-Data-Scraping-And-Analysis:

Given a list of 4000 movies, my task was to collect the data about these movies and hence, analyse the data.

Tasks Performed:

Given a list of 4000 Hindi movies - Bollywood Movies Dataset.xlsx, you need to write a python script that can perform the below expected functionality

Task 1:

Search and fetch the IMDB URLs of as many movies as possible

Task 2:

Fetch the content metadata details from the IMDB website either by web scraping or by using any of the available open APIs or python libraries for each of the above movies and store them in a csv file. Details required are:

  • Title
  • IMDB ID
  • Date of release
  • Genre
  • Cast
  • Crew
  • Plot summary
  • Plot keywords
  • IMDB Rating
  • IMDB Votes

Task 3:

Perform additional data processing to come up with more derived fields:

  • Age of content - time since release of the content.
  • Popularity of content - a score which can be a combination of IMDB rating and votes OR try to innovatively come up with a new definition.
  • Cast popularity score - score of the popularity of all of the cast.
  • Crew popularity score - score of the popularity of all of the crew.

Task 4:

Perform exploratory data analysis to come up with some of the below insights:

  • Genre distribution of titles.
  • Top 10 most acted actors etc.