/scraper-bot

A customizable web scraper

Primary LanguagePythonGNU General Public License v3.0GPL-3.0

Scraper Bot

GitHub GitHub Version PyPI - Version GitHub Workflow Status GitHub Workflow Status CodeFactor Grade

This is a bot thought to do periodical scraping of ads from commercial websites.

Found a new ad the bot will send it to you exploiting Apprise channels

Deploy

Pypi

The relative package is available on Pypi

pip install scraper-bot

The package heavily relays on playwright package, so before start to use the bot you have to install a playwright browser

playwright install --with-deps firefox

You can found further information in the playwright documentation (n.b. the bot are not limited to use firefox only)

The scraper-bot package provide the following command to run the bot

scraper-bot

Container

The CI builds the container for each version and it puts it on the public GitHub registry

ghcr.io/robertobochet/scraper-bot

docker compose

  1. Create a telegram bot and retrieve its token
  2. Download config.example.yaml and rename it to config.yaml
  3. Change the configuration follow the guidelines
  4. Download docker-compose.yaml
  5. Start the scraper with docker-compose
    docker-compose up
  6. Wait that the bot does its work!

Kubernetes (Helm chart)

For the deploy of the Scraper Bot is also available a helm chart

You can found the source code in the repo scraper-bot-chart

Helm chart package is available in the github OCI registry

oci://ghcr.io/robertobochet/scraper-bot-chart

You can use it to directly deploy on your kubernetes cluster

  1. Retrieve the default values file
    helm show values oci://ghcr.io/robertobochet/scraper-bot-chart > values.yaml
  2. Customize the values.yaml
  3. Install the scaper bot
    helm install oci://ghcr.io/robertobochet/scraper-bot-chart scraper-bot -f values.yaml

Configuration

By default the bot looks for a configuration file in the following path ./config.y(a)ml and /etc/scaraper-bot/config.y(a)ml. You cna override this behavior passing via command line the --config argument followed by the config file path

scraper-bot --config /path/to/scraper-bot-config.yaml

The configuration file has to satisfy the pydantic model which you can find in scraper_bot.settings. Furthermore you can get the config json schema from command line with --config-schema argument

scraper-bot --config-schema

You can also find a configuration example in config.example.yaml.