Given a list of 4000 movies, my task was to collect the data about these movies and hence, analyse the data.
Given a list of 4000 Hindi movies - Bollywood Movies Dataset.xlsx
, you need to write a python script that can perform the below expected functionality
Search and fetch the IMDB URLs of as many movies as possible
Fetch the content metadata details from the IMDB website either by web scraping or by using any of the available open APIs or python libraries for each of the above movies and store them in a csv file. Details required are:
- Title
- IMDB ID
- Date of release
- Genre
- Cast
- Crew
- Plot summary
- Plot keywords
- IMDB Rating
- IMDB Votes
Perform additional data processing to come up with more derived fields:
- Age of content - time since release of the content.
- Popularity of content - a score which can be a combination of IMDB rating and votes OR try to innovatively come up with a new definition.
- Cast popularity score - score of the popularity of all of the cast.
- Crew popularity score - score of the popularity of all of the crew.
Perform exploratory data analysis to come up with some of the below insights:
- Genre distribution of titles.
- Top 10 most acted actors etc.