Dutch Website Web Scraper

This project is a web scraper that extracts information from the Delta.nl website, which provides Internet, Mobile, and Television solutions. The scraper is built using Playwright in Python, allowing for robust and efficient data extraction.

Table of Contents

Installation

To get started with this project, clone the repository and install the required dependencies.

git clone https://github.com/naitridoshi/Dutch-Web-Scraper-Using-Playwright.git
pip install -r requirements.txt

Usage

  1. Configure the scraper:

    • Open config.py and set the URL and other parameters as needed.
  2. Run the scraper:

    • Execute the main script to start scraping the data.
python scraper.py
  1. Output:

    • The scraped data will be saved into a directory having text files and the file names as the page titles.

Features

  • Scrapes information on Internet, Mobile, and Television solutions.
  • Utilizes Playwright for browser automation.
  • Saves the scraped data into a text file.
  • Handles dynamic content and JavaScript-rendered pages.

Dependencies

  • Python 3.7+
  • Playwright
  • Pandas

Install the required dependencies using:

pip install -r requirements.txt

You also need to install the Playwright browsers:

playwright install

Contributing

Contributions are welcome! If you have any suggestions or improvements, please open an issue or create a pull request.

  1. Fork the repository
  2. Create a new branch (git checkout -b feature-branch)
  3. Make your changes
  4. Commit your changes (git commit -m 'Add some feature')
  5. Push to the branch (git push origin feature-branch)
  6. Open a pull request

License

This project is licensed under the MIT License - see the LICENSE file for details.