/url-spider

Primary LanguageTypeScript

URL Spider

A simple Script to spider all the URLs for a given domain and its subdomains.

Example:

yarn start:run 'https://iansinnott.com'

Will spider all the URLs at my site as well as all URLs at blog.iansinnott.com, lab.iansinnott.com, etc. URLs to external sites will be skipped.

Once the script runs it will dup all the information to a temp file. The location on your system will depend on the built-in mktemp util.

Usage

yarn start:run <url>

Will spider the <url> and all its subdomains.

FIXME

This script does no stream processing. In other words, it will quite happily eat up all the JS heap memory if the site you’re spidering has many URLs.