crawling-scale-up

Installation

You will need Redis and python3 installed. After that, install all the necessary libraries by running pip install.

pip install install requests beautifulsoup4 playwright "celery[redis]"
npx playwright install

Configure the Redis connection on the repo file and Celery on the tasks file.

You need to start Celery and the run the main script that will start queueing pages to crawl.

celery -A tasks worker

python3 main.py

Pull requests are welcome. For significant changes, please open an issue first to discuss what you would like to change.