This project is one of the Algoritma Academy Data Analytics Specialization's Capstone Project. In this project, I am learning how to do web scraping, a method for collecting data directly from a website, using BeautifulSoup library. I tried to retrieve movies' information from IMDB, a website especially designed for people to explore the world of movies and shows with millions of movies, TV, and entertainment programs in its database. The goal in this project is to be able to extract the information directly from the site, conduct an analysis on the most popular movies in 2021, and create a simple Flask dashboard to present the result. This project is made for educational purpose.
The packages required in this project are provided in requirements.txt
file.
- Create a function to obtain
Title
,IMDB Rating
,Metascore
, andVotes
from a single IMDB website page. - Using iteration to obtain 1.000 movies' data from multiple pages.
- Transform the obtained data into Pandas dataframe and data cleaning.
- Perform analysis and visualization on the data.
- Create a simple Flask dashboard based on the analysis result.