temoto/robotstxt

Different behavior on Google Webmaster Tools robots.txt checker and robotstxt-go

uforic opened this issue · 3 comments

I noticed that on Google Webmaster Tools robots.txt checker, the following robots.txt:

User-agent: *
Allow: /
Allow: /blog/*
Disallow: /*/*

will allow website.com/blog/article, as well as website.com/blog/article/.

However, when tested against robotstxt-go, only website.com/blog/article is allowed through, and not website.com/blog/article/. I must add an additional line for robotstxt-go to allow the second URL through, so my robots.txt looks more like:

User-agent: *
Allow: /
Allow: /blog/*
Allow: /blog/*/
Disallow: /*/*

I'm running robotstxt-go as the GoogleBot user-agent. Any other thoughts on whether this is expected behavior / why this might be happening?

Thanks!

This seems like a bug in parser, please wait.

@uforic please see attached commit, there's a new test for wildcard suffix, but it passes without changing any code. Maybe your robots.txt where it fails is a bit more complicated?

Apologies, I realize it had to do with some conflicting rules in my robots.txt. Sorry!