woothee/woothee-java

Unrecognized IE 11

qbolec opened this issue · 1 comments

We find about 1.6% of incoming traffic to be identified as Internet Explorer UNKNOWN.

The UA strings lookslike these:

select p.useragent, count(*) as cnt, count(distinct user_id)
from php_logs p
join woothee_useragent_dim w
   on p.useragent = w.useragent
where dt >= '20150401' and dt <= '20150430'
   and name = 'Internet Explorer' and version = 'UNKNOWN'
group by p.useragent
order by cnt desc
limit 30;

Mozilla/5.0 (Windows NT 6.3; WOW64; Trident/7.0; LCJB; rv:11.0) like Gecko      12691436        13036
Mozilla/5.0 (Windows NT 6.3; WOW64; Trident/7.0; MASMJS; rv:11.0) like Gecko    5470585 4994
Mozilla/5.0 (Windows NT 6.3; WOW64; Trident/7.0; ASU2JS; rv:11.0) like Gecko    5052566 4996
Mozilla/5.0 (Windows NT 6.3; WOW64; Trident/7.0; MAARJS; rv:11.0) like Gecko    4481126 4056
Mozilla/5.0 (Windows NT 6.3; WOW64; Trident/7.0; MALNJS; rv:11.0) like Gecko    4194491 4110
Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; NP06; rv:11.0) like Gecko      3969474 4023
Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; MAAU; rv:11.0) like Gecko      3438683 3266
Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; MATM; rv:11.0) like Gecko      3085392 3057
Mozilla/5.0 (Windows NT 6.3; WOW64; Trident/7.0; Touch; rv:11.0) like Gecko     2188720 2821
Mozilla/5.0 (Windows NT 6.3; Win64; x64; Trident/7.0; LCJB; rv:11.0) like Gecko 1695819 1555
Mozilla/5.0 (Windows NT 6.3; WOW64; Trident/7.0; MASEJS; rv:11.0) like Gecko    1460236 1615
Mozilla/5.0 (Windows NT 6.3; WOW64; Trident/7.0; MATMJS; rv:11.0) like Gecko    1447890 1167
Mozilla/5.0 (Windows NT 6.3; WOW64; Trident/7.0; MDDCJS; rv:11.0) like Gecko    963311  620
Mozilla/5.0 (Windows NT 6.3; Trident/7.0; Touch; rv:11.0) like Gecko    927900  1599
Mozilla/5.0 (Windows NT 6.3; Win64; x64; Trident/7.0; ASU2JS; rv:11.0) like Gecko       772250  821
Mozilla/5.0 (Windows NT 6.3; WOW64; Trident/7.0; Touch; LCJB; rv:11.0) like Gecko       751545  855
Mozilla/5.0 (Windows NT 6.3; Win64; x64; Trident/7.0; MAARJS; rv:11.0) like Gecko       746599  617
Mozilla/5.0 (Windows NT 6.3; Win64; x64; Trident/7.0; Touch; rv:11.0) like Gecko        677468  714
Mozilla/5.0 (Windows NT 6.3; WOW64; Trident/7.0; MATBJS; rv:11.0) like Gecko    522275  515
Mozilla/5.0 (Windows NT 6.3; Win64; x64; Trident/7.0; MALNJS; rv:11.0) like Gecko       463573  458
Mozilla/5.0 (Windows NT 6.3; Win64; x64; Trident/7.0; MASMJS; rv:11.0) like Gecko       462232  571
Mozilla/5.0 (Windows NT 6.3; WOW64; Trident/7.0; Touch; ASU2JS; rv:11.0) like Gecko     457776  848
Mozilla/5.0 (Windows NT 6.1; Trident/7.0; NP07; NP07; rv:11.0) like Gecko       385464  544
Mozilla/5.0 (Windows NT 6.3; WOW64; Trident/7.0; MALCJS; rv:11.0) like Gecko    369572  293
Mozilla/5.0 (Windows NT 6.1; Trident/7.0; MAMD; rv:11.0) like Gecko     352423  239
Mozilla/5.0 (Windows NT 6.3; WOW64; Trident/7.0; TNJB; rv:11.0) like Gecko      293674  342
Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; MANM; rv:11.0) like Gecko      291762  218
Mozilla/5.0 (Windows NT 6.3; WOW64; Trident/7.0; MAPBJS; rv:11.0) like Gecko    286097  234
Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; MDDRJS; rv:11.0) like Gecko    245578  299
Mozilla/5.0 (Windows NT 6.3; Win64; x64; Trident/7.0; Touch; LCJB; rv:11.0) like Gecko  226963  277

It seems to me that the problem is that the regular expression used in
https://github.com/woothee/woothee-java/blob/5a7de46936f5f6e4b0e76620bc4b991e344ff53d/src/main/java/is/tagomor/woothee/browser/MSIE.java does not allow for tokens between "Trident/7.0;" and "rv:11.0", such as "Touch;" or "MASMJS;".
The later seem to be manufacturer's codes http://www.whatismybrowser.com/developers/unknown-user-agent-fragments

Release 1.2.0 has fix for this problem.
Thank you for reporting, and sorry for late reply.