Chai is a simple web crawler that scrapes relevant SEO data from each page it visits.
npm install @dschnare/chai -g
chai http://mywebsite.com > crawl.json
Chai will scrape the following data from each page it visits.
- Page title
- All H1 values
- All H2 values
The scrape data written to stdout
is a JSON array of objects with the following shape:
{
title,
url,
headings: {
h1: [],
h2: []
}
}
For URLs that respond with a 404
the scrape object has this shape:
{
url, notFound: true
}
- Expose way to filter out URLs to be crawled
- Expose way to customize the scraper
- Make it easier to identify 404 URLs
- Add option to control verbosity