[Feature Request] Block crawler bots by default with robots.txt
hashFactory opened this issue · 2 comments
Hi all, I have a public-facing instance of Miniserve running at home and I've noticed that I often get flooded with requests for random files in my directories by Googlebot. (including requests for zipped download of entire directory)
I would love it if miniserve either by default, or through a switch, served a static robots.txt
that disallows crawling by bots.
Google Developers has an example of how to formulate a robots.txt that disallows crawling here.
If there's some interest I don't mind trying to implement it myself but it would have to be in a few days.
Open to suggestions!
Hm I guess that'd make sense. How about a switch --allow-crawlers
that disables the automatic robots.txt
?
Hmm, this sounds like a reasonable, easy to implement thing! If you have a robots.txt yourself it would just always serve that, but if it doesn't it would simply serve a static robots.txt that disallows crawlers (needs to be implemented to work with random path generation). But then I think I'd rename the flag to something like --no-robots-txt
.
The only concern I have is that running miniserve as a permanent web server seems like it's not exactly it's intended purpose and would be better left up to something like megaserve Nginx, right? But even despite that it seems like a very reasonable, small feature to add!