raviqqe/muffet

Add support for outputing to JSON all crawled URLs (including 200 ones)

cipriancraciun opened this issue · 0 comments

In addition to #38, where one wants to save the failed URLs as JSON, listing also all resources (i.e. those that return 30x or 200) could be useful.

For example one could use muffet to crawl a site in order to extract a list of all dependent resources (CSS, JS, images, etc.) and other linked-to pages.

Then one could use these URLs for other analytical purposes, or even to warmup a cache after a redeploy.

With the current format, the links JSON list could be expanded with all encountered URLs and replacing error with status to easily differentiate what was an error and what was a successful crawl.