Snapcrawl is a command line utility for crawling a website and saving screenshots.
- Crawls a website to any given depth and save screenshots
- Can capture the full length of the page
- Can use a specific resolution for screenshots
- Skips capturing if the screenshot was already saved recently
- Uses local caching to avoid expensive crawl operations if not needed
- Reports broken links
Snapcrawl requires PhantomJS and ImageMagick.
You can run Snapcrawl by using this docker image (which contains all the necessary prerequisites):
$ docker pull dannyben/snapcrawl
Then you can use it like this:
$ docker run --rm -it dannyben/snapcrawl --help
For more information refer to the docker-snapcrawl repository.
$ gem install snapcrawl
$ snapcrawl --help
Snapcrawl
Usage:
snapcrawl go URL [options]
snapcrawl -h | --help
snapcrawl -v | --version
Options:
-f, --folder PATH
Where to save screenshots [default: snaps]
-n, --name TEMPLATE
Filename template. Include the string '%{url}' anywhere in the name to
use the captured URL in the filename [default: %{url}]
-a, --age SECONDS
Number of seconds to consider screenshots fresh [default: 86400]
-d, --depth LEVELS
Number of levels to crawl [default: 1]
-W, --width PIXELS
Screen width in pixels [default: 1280]
-H, --height PIXELS
Screen height in pixels. Use 0 to capture the full page [default: 0]
-s, --selector SELECTOR
CSS selector to capture
-o, --only REGEX
Include only URLs that match REGEX
-h, --help
Show this screen
-v, --version
Show version number
Examples:
snapcrawl go example.com
snapcrawl go example.com -d2 -fscreens
snapcrawl go example.com -d2 > out.txt 2> err.txt &
snapcrawl go example.com -W360 -H480
snapcrawl go example.com --selector "#main-content"
snapcrawl go example.com --only "products|collections"
snapcrawl go example.com --name "screenshot-%{url}"
snapcrawl go example.com --name "`date +%Y%m%d`_%{url}"