Crawler: Ignore Avatars via robots.txt

Question

Crawler: Ignore Avatars via robots.txt

Sapd opened this issue 9 years ago · 1 comments

As I worked with a scalable web crawler (apache nutch), to scan my serverlist and outgoing links, I noticed that you didn't forbid crawler to scan the avatars.

I would suggest that you do so in robots.txt, to avoid unnecessary traffic from other crawlers (The problem is that you don't use the query syntax like ?avatar=, so the crawler thinks the avatars are individual pages).
e.g.

User-agent: *
Disallow: /avatars/*

Answer 1 · 2016-02-15T00:42:12.000Z

I guess the toss of up is do we see any benefit from being indexed at the image level? That's probably (?) a no, but I would be open to opinion on that.