matomo-org/device-detector

Samsung Messages

creadone opened this issue · 3 comments

FYI

One of the clients our shorten link service informed us about the inconsistency of statistics. According to his story we double it up at least. The Client is engaged in SMS mailing and accurate statistics is important to him.

We return back Nginx logs from archive and found strange behavior ordinary user-agent, please look:

Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:24.0) Gecko/20100101 Firefox/24.0

It is a just Desktop Ubuntu user with old Firefox released at September 17, 2013. However, this user follow by uniq link three times in the second. Every time, from different IPs, users follows by uniq link three times in the second.

This is not a bot, this is not a young hacker, this is a — SMS Link Rich Preview by Samsung. Okay, we know that everyone lies but it just looks like poor-quality software.

  1. Discussion on Stackoverflow: https://stackoverflow.com/q/48068227/1597964
  2. Unanswered question from Samsung Dev: https://developer.samsung.com/forum/thread/sms-link-rich-preview/201/346325

Sorry for the late response.
As the useragent actually only contains valid details, there is nothing we could do in this library.
If you want to sort them out, that might need to be done based on the results.

@sgiehl, not a problem. We added this UA to exception rules.

I have one question and will be glad if you take the time to answer. I support the shard device_detector based on your regexes. It's not very convenient, because each new UA can be unrecognised and I have to reprocess raw statistic data after update regexes.

A slightly easier to maintain solution is to use grammar (like BNF) based parsers. It is more flexible tool because you don't need describe in Regex rule each known UA, you can describe the types of UA and then extract data. For example i found grammar for ANTLR parser generator. It worked and I think it requires less effort to support.

Have you considered the option with a grammar based parsers?

No. Actually I haven't considered something like this yet. Will try to have a look when I have some time. But actually I'm not sure if that would make the detection faster or slower