JayBizzle/Crawler-Detect

Bots with love from Russia

antonydevanchi opened this issue · 3 comments

Lines extracts from a few websites in Russia with high traffic.
Each suggested line checked with «User agent is NOT a bot» result.

Sample sources.

@JayBizzle please, approve list and after your «OK» i'll send PR.

Confirmed

Suspicious

  • AppleCoreMedia/1.0.0.15E216 (iPhone; U; CPU OS 11_3 like Mac OS X; ru_ru)

  • U7IA53O8WRQLP5HTN36O 520895322210219123 178783837217128874 128152866426197781 170646164760074160 897891597049094717 10798953746823634 865980663587406199 83960315279034295 50879782656773070 425389085694464584 520832197366222982 566888253833316094 497618096526395537 425659151411788951 494971665505918016 870530337619880183 349139421128410017 627636913921908195 757052309832051516 923961866520554027 Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36

  • P5IFYN4 466155777352526040 187582494976457628 205329424338575162 290476900380796124 777992341037289367 321980071372304533 644559670148391318 Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36

  • Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; MRSPUTNIK 2, 4, 0, 386; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; BRI/2; AskTbFXTV5/5.14.1.20007)

  • Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Pivim Multibar; MRSPUTNIK 2, 4, 0, 171; MRA 5.7 (build 03796))

Happy with the ones I have ticked 👍

Any more information on the AppleCoreMedia bot?

The suspicious ones definitely look like they could be added, i'm just wondering how easy it will be to detect them. The first 2 look like they use random strings, or are those strings always the same?

The 3rd one in the suspicious list we already detect

Thanks

Added:
— Atlassian Webhook HTTP Client
— Mozilla/5.0 (compatible; Domains Project/1.1.0; +https://domainsproject.org)

Any more information on the AppleCoreMedia bot?

0.0.0.0 - - [15/Jun/2018:10:43:00 +0300] "GET /audio/chat_new_message.ogg HTTP/1.1" 206 2 "https://domain.tld/chat" "AppleCoreMedia/1.0.0.15E216 (iPhone; U; CPU OS 11_3 like Mac OS X; ru_ru)"

0.0.0.0- - [15/Jun/2018:10:43:00 +0300] "GET /audio/chat_new_message.ogg HTTP/1.1" 206 69157 "https://domain.tld/chat" "AppleCoreMedia/1.0.0.15E216 (iPhone; U; CPU OS 11_3 like Mac OS X; ru_ru)"

Hmm... After deep dive into i'm not sure that clear bot User-Agent. It seems like WebKit subproccess.

The 3rd one in the suspicious list we already detect

Yep, you right, my mistake. Removed.

The suspicious ones definitely look like they could be added, i'm just wondering how easy it will be to detect them. The first 2 look like they use random strings, or are those strings always the same?

Just skip right now and will back to you a little bit later with more information.

Updated first post.

Look forward to your PR 👍