This is a simple worker to crawl and save links from sitemaps.
The gathered links are saved to KV storage, and returned in the response.
That's simple!
Just set your account info in wrangler.toml
and deploy to Cloudflare Workers.
Add the site url you'd like to crawl to the sitesToCrawl.json file
Run yarn dev
to start the worker in a preview environment.
Visit the preview url, the response should contain a link tree in the following format:
{
"sitemapUrl1": [
"link1",
"link2",
"link3"
],
"sitemapUrl2": [
"link1",
"link2",
"link3"
],
{...}
}