minotar/minotar.net

Crawler: Ignore Avatars via robots.txt

Sapd opened this issue · 1 comments

Sapd commented

As I worked with a scalable web crawler (apache nutch), to scan my serverlist and outgoing links, I noticed that you didn't forbid crawler to scan the avatars.

I would suggest that you do so in robots.txt, to avoid unnecessary traffic from other crawlers (The problem is that you don't use the query syntax like ?avatar=, so the crawler thinks the avatars are individual pages).
e.g.

User-agent: *
Disallow: /avatars/*

I guess the toss of up is do we see any benefit from being indexed at the image level? That's probably (?) a no, but I would be open to opinion on that.