GovTechSG/purple-a11y

Respect Robots.txt Files

mgifford opened this issue · 1 comments

Sites should respect the robots.txt files that some sites use to manage traffic.

Would be great if by default the scanner respected the wishes of the site owner.

We have developed a feature to follow robots.txt with -r flag when running node cli

  -r, --followRobots                 Option for crawler to adhere to robots.txt
                                     rules if it exists
                                 [string] [choices: "yes", "no"] [default: "no"]