Different behavior on Google Webmaster Tools robots.txt checker and robotstxt-go
uforic opened this issue · 3 comments
I noticed that on Google Webmaster Tools robots.txt checker, the following robots.txt:
User-agent: *
Allow: /
Allow: /blog/*
Disallow: /*/*
will allow website.com/blog/article
, as well as website.com/blog/article/
.
However, when tested against robotstxt-go, only website.com/blog/article
is allowed through, and not website.com/blog/article/
. I must add an additional line for robotstxt-go to allow the second URL through, so my robots.txt looks more like:
User-agent: *
Allow: /
Allow: /blog/*
Allow: /blog/*/
Disallow: /*/*
I'm running robotstxt-go as the GoogleBot user-agent. Any other thoughts on whether this is expected behavior / why this might be happening?
Thanks!
This seems like a bug in parser, please wait.
@uforic please see attached commit, there's a new test for wildcard suffix, but it passes without changing any code. Maybe your robots.txt where it fails is a bit more complicated?
Apologies, I realize it had to do with some conflicting rules in my robots.txt. Sorry!