This project is a web scraper that extracts information from the Delta.nl website, which provides Internet, Mobile, and Television solutions. The scraper is built using Playwright in Python, allowing for robust and efficient data extraction.
To get started with this project, clone the repository and install the required dependencies.
git clone https://github.com/naitridoshi/Dutch-Web-Scraper-Using-Playwright.git
pip install -r requirements.txt
-
Configure the scraper:
- Open
config.py
and set the URL and other parameters as needed.
- Open
-
Run the scraper:
- Execute the main script to start scraping the data.
python scraper.py
-
Output:
- The scraped data will be saved into a directory having text files and the file names as the page titles.
- Scrapes information on Internet, Mobile, and Television solutions.
- Utilizes Playwright for browser automation.
- Saves the scraped data into a text file.
- Handles dynamic content and JavaScript-rendered pages.
- Python 3.7+
- Playwright
- Pandas
Install the required dependencies using:
pip install -r requirements.txt
You also need to install the Playwright browsers:
playwright install
Contributions are welcome! If you have any suggestions or improvements, please open an issue or create a pull request.
- Fork the repository
- Create a new branch (
git checkout -b feature-branch
) - Make your changes
- Commit your changes (
git commit -m 'Add some feature'
) - Push to the branch (
git push origin feature-branch
) - Open a pull request
This project is licensed under the MIT License - see the LICENSE file for details.