This project is a web scraper built using Selenium and OpenPyXL in Python. It extracts text, images, and links from a list of websites provided in an Excel file and writes the scraped data into a new Excel file.
- Python 3.x
- Selenium
- WebDriver Manager for Chrome
- OpenPyXL
- Clone the repository or download the script files.
- Install the required Python packages:
pip install selenium webdriver-manager openpyxl
-
Prepare the Input Excel File:
- Create an Excel file named
LinksSeleniumWEBSC.xlsx
and place it in the directoryC:\Users\ACER\Desktop\Web Scraper selenium\
. - The Excel file should have a worksheet named "Sheet1".
- List the website URLs starting from the second row (A2, A3, etc.).
- Create an Excel file named
-
Run the Script:
- Open a command prompt or terminal.
- Navigate to the directory where the script is located.
- Run the script using Python:
python main.py
- The script will read the URLs from the input Excel file and scrape each website for text, images, and links.
- The scraped data will be written to a new worksheet named "Scraped_Data" in the
link2.xlsx
file located inC:\Users\ACER\Desktop\Web Scraper selenium\
.
This function:
- Loads the webpage using the provided URL.
- Waits until the page is fully loaded.
- Extracts text, images, and links from the webpage.
- Returns a dictionary containing the scraped data.
This function:
- Configures the Selenium WebDriver to run in headless mode.
- Reads website URLs from the input Excel file.
- Iterates over the URLs, scraping each one and storing the results.
- Writes the scraped data to a new worksheet in the output Excel file.
- The script handles missing or unreadable Excel files gracefully, printing appropriate error messages.
- If scraping a particular URL fails, the error is caught, and the script continues with the next URL.
Ensure your LinksSeleniumWEBSC.xlsx
contains:
| URL |
|-----------------------|
| https://example.com |
| https://another.com |
After running the script, the output in link2.xlsx
will look like:
| Text | Images | Links |
|----------------------------|------------------------------|--------------------------------|
| Example Domain | https://example.com/image1 | https://example.com/link1 |
| Another Example | https://another.com/image1 | https://another.com/link1 |