Add support for outputing to JSON all crawled URLs (including 200 ones)
cipriancraciun opened this issue · 0 comments
cipriancraciun commented
In addition to #38, where one wants to save the failed URLs as JSON, listing also all resources (i.e. those that return 30x or 200) could be useful.
For example one could use muffet
to crawl a site in order to extract a list of all dependent resources (CSS, JS, images, etc.) and other linked-to pages.
Then one could use these URLs for other analytical purposes, or even to warmup a cache after a redeploy.
With the current format, the links
JSON list could be expanded with all encountered URLs and replacing error
with status
to easily differentiate what was an error and what was a successful crawl.