This Python script scrapes reviews from IMDb for a given list of IMDb title IDs.
This script requires the following Python libraries:
- pandas
- numpy
- beautifulsoup4
- selenium
- webdriver_manager (to install the appropriate chromedriver for your system)
You can install these libraries using the following command:
pip install -r requirements.txt
- Clone this repository.
- Install the required libraries (see above).
- Place your IMDb title IDs in a comma-separated list.
- Run the script using the following command:
python scraper.py
Note: You will be prompted to enter the following:
- IMDb title IDs (comma-separated)
- Output file name (without extension)
The script will scrape reviews for each IMDb title ID and save the results to a CSV file.
scraper.py
: This file contains all the scraping logic for IMDb reviews.requirements.txt
: This file lists the required Python libraries.
- The script takes a list of IMDb title IDs as input.
- For each title ID, it constructs the URL for the reviews page.
- It uses Selenium to open the reviews page in a headless Chrome browser.
- It uses BeautifulSoup to parse the HTML content of the reviews page.
- It iterates through each review and extracts the title, date, content, and user rating (if available).
- It stores the extracted data in a Pandas DataFrame.
- It saves the DataFrame to a CSV file.
This script is for educational purposes only. Scraping data from websites without permission can be a violation of their terms of service. Please use this script responsibly and ethically.