Couldn't keep to the domain

Question

Couldn't keep to the domain

mgifford opened this issue 6 months ago · 3 comments

When I added -a, but I should be able to do either:

-a, --additional

With

node --max-old-space-size=6000 --no-deprecation purple-a11y/cli.js -u https://www.whitehouse.gov -c 2 -s same-domain -p 50 -a none --blacklistedPatternsFilename ./pa-gTracker-exclude-medicare.csv -k "Random Example:random@example.com"

It ran fine, but I found sub-domains in the returned results.

Answer 1 · 2024-04-16T01:46:58.000Z

Hi @mgifford,

-s same-domain will result in scan results from any sub-domain of the parent domain. E.g. scanning -s same-domain -u tom.example.tld will make it possible to scan jerry.example.tld and example.tld.

If you wish to stick to just scan tom.example.tld, then specify -s same-hostname -u tom.example.tld.

Marking this issue as closed. Let me know if you encountered otherwise. :)

Answer 2 · 2024-04-16T15:23:17.000Z

I read this completely wrong:

-s, --strategy Strategy to choose which links to crawl in
a website scan. Defaults to "same-domain".
[choices: "same-domain", "same-hostname"]

It's not useful to go into a discussion of domain vs hostname, but maybe it is possible to change the help text.

Maybe something like:

-s, --strategy Crawl specific hostname or more general domain
Defaults to "same-domain", which includes sub-domains.
[choices: "same-domain", "same-hostname"]

Maybe it's just me though @younglim

Answer 3 · 2024-04-17T02:45:50.000Z

Thanks @mgifford! I have re-word it to as follows so it's clearer:

  -s, --strategy                     Crawls up to general (same parent) domains,
                                      or only specific hostname. Defaults to "sa
                                     me-domain".
                                       [choices: "same-domain", "same-hostname"]