Crawler: Ignore Avatars via robots.txt
Sapd opened this issue · 1 comments
Sapd commented
As I worked with a scalable web crawler (apache nutch), to scan my serverlist and outgoing links, I noticed that you didn't forbid crawler to scan the avatars.
I would suggest that you do so in robots.txt, to avoid unnecessary traffic from other crawlers (The problem is that you don't use the query syntax like ?avatar=, so the crawler thinks the avatars are individual pages).
e.g.
User-agent: *
Disallow: /avatars/*
LukeHandle commented
I guess the toss of up is do we see any benefit from being indexed at the image level? That's probably (?) a no, but I would be open to opinion on that.