Food for Thinking

Question

Food for Thinking

Opened this issue 17 days ago · 3 comments

This is not a suggestion or a request. Also i have no idea IF and HOW it could be implemented.

Today i saw some hundreds of hits from AWS scraper, same IP but with different User Agents:

User agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_2) Gecko/20100101 Firefox/71.5
User agent: Mozilla/5.0 (Linux; U; Android 4.4; Nexus_S_4G Build/GRJ22) AppleWebKit/534.41 (KHTML, like Gecko) Chrome/50.0.1711.222 Mobile Safari/603.1
User agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 9_2_3) AppleWebKit/603.49 (KHTML, like Gecko) Chrome/53.0.1776.150 Safari/601
User agent: Mozilla/5.0 (compatible; MSIE 8.0; Windows NT 6.1; Win64; x64; en-US Trident/4.0)

etc etc

So i was thinking that a bad sign would be the SAME IP, multiple hits, different User Agents.

Answer 1 · 2024-06-10T12:29:59.000Z

There was something similar suggested at #193, though I never quite got around to figuring out exactly how it would work, either. Still would be a good thing to implement, if I could figure it out eventually. But yeah; definitely a bad sign IMO, too.

Answer 2 · 2024-06-10T12:59:33.000Z

Maybe in Bobuam module.

But that would need somekind of db for each ip and its User Agents. Something like an IP reputation score.

And all that maybe beyond CIDRAM scope.

Answer 3 · 2024-06-10T16:22:41.000Z

Suggest use cache with a key of IP and storing the UA. Have been thinking along similar lines recently. If a page is retrieved it should always download associated images. Page without images equals potential problem, and images without pages similarly. Just need to allow for search engines. Fortunately my site uses php to deliver images so the concept should work.