/crawlr

Web spider that verifies that every page on your site contains certain string patterns.

Primary LanguageGoMIT LicenseMIT

crawlr

A web spider, written in Go, that scans your site looking for pages that are missing particular string patterns.

Configuration

The configuration file is passed to the spider as its first command-line parameter. The file should contain a JSON object with the following keys:

  • startUrl
  • dropHttps
  • allowedDomains
  • rewriteDomains
  • filteredUrls
  • droppedParameters
  • requiredPatterns

An example JSON configuration file is included in the repo.

Options

You can also change certain operational details of the spider through the use of command-line parameters:

  • -h : print list of accepted parameters
  • -d : how deep to spider the site (default:2)
  • -n : maximum concurrent requests (default:1)
  • -s : show live stats (default:false)
  • -v : verbose output (default:false)

Output

As the spider progresses, it will print out the URL's of pages that didn't contain the required patterns, as well as the list of patterns that weren't found.