Sitemap Enumerator

Based on

This project needs python 3 to be available as python on the path. (Can be accomplished with a venv.)

How to use

First setup a venv with the dependencies and have docker and docker-compose ready. Adjust the scale in the docker-compose.yml file as desired. Then run docker-compose up to spin up the workers.

Next run ./ <url> to enqueue a site. Wait until all workers have finished their processing.

Optionally requeue failed using ./ or in-progress (in case a worker crashed at some point) with ./

Finally, extract the info using ./ Under dump/ you'll find the files. The urls.txt file contains the found urls and the valid_sitemaps.txt file contains the urls of the sitemaps themselves.