Crawl through a domain and check every page for errors
Named after the Airborne Warning and Control System, this program provides early warnings for potential problems in your website. It doesn't provide a complete overview, but detects signals that you may want to investigate further.
This tool is a Ruby-application that you run from the command-line. You can customize the installation depending on your needs, but the following process is usually the fastest way to get started:
- Install homebrew if you are not using it already.
- Use homebrew to install a recent version of Ruby
and wget by running
brew install ruby
andbrew install wget
. - Install bundler (the Ruby package manager) by running
gem install bundler
. - Clone this repository to a convenient location on your PC. I use
/Users/{name}/tools/awacs
on my PC. - Add the location to your PATH.
- Install the dependencies by running
bundle install
. - Verify your installation by running
awacs -v
orawacs -h
.
In some cases, ruby might not find your gems to run awacs on the command line.
You can run awacs in the context of your bundle by running
bundle exec awacs https://in10.nl
or configure bundler binstubs.
awacs [url] [options]
url
should be the base URL of the website you're going to test, e.g. http://www.website.com. You must specify the full URL, including the protocol (http://, https://). Also note that if your website is configured to redirect to all traffic to www.website.com, or website.com, you must use the correct form in the scope, or you'll get zero results.
This URL acts as a scope: AWACS uses it to determine whether a page is on your website, or not. You could also use it to check only a subset of the website. For example, if you use http://www.website.com/planning/ as the base URL http://www.website.com/planning/phase-one would be included, but http://www.website.com/about-us and http://instagram.com won't be included.
Valid options
are:
Short | Long | Effect |
---|---|---|
-e | --errors | Only show pages with errors in the final output. Can be combined with --warnings. |
-f | --fast | Will skip all checks marked as slow |
--folder | Dump all tested pages to this folder, allowing for manual inspection | |
-h | --help | Print usage instructions |
--output | Set the output format of results. Default is "human" | |
--password | HTTP Basic Authentication password | |
--username | HTTP Basic Authentication username | |
-v | --version | Print version number |
-w | --warnings | Only show pages with warnings in the final output. Can be combined with --errors. |
The --output
parameter determines what the program prints to stdout. You can choose from:
- human: displays a progress spinner, and formats a nice, readable ASCII-table of results
- csv: output comma-separated results, ready for import in Excel, etc. Note that the data will be
printed to stdout. You probably want to redirect it to a file like so:
$ awacs http://www.jakobbuis.nl --output-csv > results.csv
. - debug: output verbose statements what AWACS is doing
- silent: output nothing (not even errors). The exit codes can be used to determine the result of analysis.
--errors
and --warnings
(or -e
and -w
) can be combined. If both options
are not passed, the default is to list every page.
Passing these options does not affect the exit codes.
Settings | Pages with no problems | Pages with errors | Pages with warnings |
---|---|---|---|
none given | ✔ | ✔ | ✔ |
--errors --warnings | ✔ | ✔ | |
--errors | ✔ | ||
--warnings | ✔ |
The program returns an appropriate exit code based on its results:
Code | Situation |
---|---|
0 | No errors or warnings |
1 | Invalid parameters given |
2 | Website triggered errors and/or warnings |
3 | Website triggered warnings |
4 | Selected folder (--folder) does not exist, or is not writeable |
5 | Selected folder (--folder) is not empty |
6 | Downloading the website using wget failed (wget returned a non-zero exit code) |
7 | wget could not be found (in PATH). wget is required for awacs to operate |
awacs executes the following tests on every run. Slow tests are skipped when awacs is run with the --fast option.
Checker | Speed | Purpose |
---|---|---|
Images | slow | Check <img> tags for broken src attributes |
Invalid HTML | fast | Check for parsing errors in HTML |
Links | slow | Check <a> tags for broken href attributes. |
Page failures | fast | Check the log of downloaded pages for failures. |
Scripts | fast | Check <script> tags with src attributes for broken references |
Stylesheets | fast | Check <link> tags with href attributes for broken references |
Trigger words | fast | Check the page for keywords such as "error" and "exception" that often indicate server-side problems |
Dummy content | fast | Check the page for keywords that indicate dummy text such as "lorem ipsum" |
Linking to any Linkedin-profile generally results in a 999 HTTP status code due to automated bot detection. This program ignores the robots.txt file on your own website, but makes no attempt to appear as a legitimate user.
Written in Ruby, this program uses wget to download all pages in scope. Files are parsed as strings, and than fed to various Checker-classes which have their own dependencies to do various things. For example, we use nokogiri to parse all <a> tags in a page and OpenURI to check their responses, in order to find all broken links on every page.
Copyright 2017 IN10. This project is distributed under the MIT-license. It is subject to the license terms in the LICENSE file found in the top-level directory of this distribution and at https://opensource.org/licenses/MIT. No part of AWACS, including this file, may be copied, modified, propagated, or distributed except according to the terms contained in the LICENSE file.