/siteweb

quickly and easily get stats about all the pages on your website

Primary LanguageJavaScriptMIT LicenseMIT

siteweb

Build Status

siteweb is a tool that can quickly and easily get stats about all the pages on your website. Give it URLs and it will go fetch all of the linked pages and record info about each page. This is useful for testing websites and making sure nothing breaks after deploys for instance. This can also be used to identify the slowest (or fastest) pages on your website.

  • easy to use, just start with a URL
  • runs quickly using isomorphic-fetch and cheeriojs
  • runs on the client and the server
  • concurrency control
  • returns a promise
  • cli args parsed with yargs
  • option to add a delay between requests

Demo

Note: This online demo is still limited by the same origin policy so it may not work with many websites (unless they have the Access-Control-Allow-Origin:* header). However, the node version does not have this limitation and should work with any website. Also, the Demo has a file input to visualize the json structure that the cli generates.

Try it online

Getting Started

npm install -g siteweb
npm install --save-dev siteweb

Usage

Use it via the cli

siteweb http://blog.timscanlin.net
node ./cli http://blog.timscanlin.net

Or use it with the js api

siteweb.run(options, (err, data) => {
  if (err) {
    throw new Error(err)
  }
  process.stdout.write(JSON.stringify(data))
})

Currently it only exposes one run method.

Default Options

module.exports = {
  // Urls to start from.
  startUrls: [
    'http://blog.timscanlin.net'
  ],
  // Limit the number of concurrent requests.
  concurrency: 6,
  // Max queue size.
  maxQueue: 500,
  // Whether to include any external URLs in output.
  includeExternal: true,
  // Whether to fetch the external pages (depends on `includeExternal`)
  fetchExternal: false,
  // Limit of pages to fetch.
  maxPages: 500,
  // Delay between requests in ms.
  delay: 0,
  // Pre fetch callback.
  preFetchCallback: () => {},
  // Post fetch callback.
  postFetchCallback: () => {},
}

Warning

Be careful! This tool recursively fetches all the links on a website. By default it has maxPages set to 500 and concurrency set to 6 but these values are configurable as is the boolean fetchExternal option which will check external pages as well (not recursively). If you change these options siteweb can consume a lot of resources on your computer or other websites so please use with care.

TODO

  • demo page with visualization (more detail)
  • more output options / data?
  • make a similar project using nightmare that can run js