oalders/http-browserdetect

Link opened from Microsoft Office doc incorrectly identifies OS as Unix

evi1m0nkey opened this issue · 19 comments

When opening a URL from within a Microsoft Office document, the user agent is given as "Microsoft Office Existence Discovery" (see http://blogs.msdn.com/b/vsofficedeveloper/archive/2008/03/11/office-existence-discovery-protocol.aspx for details). This causes $browser->os_string() to incorrectly report Unix, I believe because line 1005's test for 'sco' is matching in Discovery.

Nice catch. We should fix this for sure. Would be great if you wanted to provide a patch. :)

Cool, I certainly would be willing to make a patch, but I'll admit I don't know the first thing about how to go about doing it. If I just sent an updated version with the change, is that helpful?

If that's what you're comfortable with, then by all means. :) It would be easiest if you use the latest version in the repository and then I can create a patch based on your work. If you're able to run the test suite to check your changes haven't broken anything else, that's helpful too. If you're feeling adventurous you can also add a test case to t/useragents.json with this particular useragent.

I will give it a go. When I'm done, what do I do with it, try to put it back into git? I'm more of a "Perl script enthusiast" than a real developer, so I'm in pretty new territory here :)

Great! If you want to do it the git way, then you'd basically:

  1. fork this repository
  2. clone it locally
  3. make your changes and "git commit" them

GitHub has some GUIs to make this easier:

https://mac.github.com/
https://windows.github.com/

You might find one of these to be a good starting point. (I haven't really used them, but it saves you a bunch of tooling around at the command line which can be frustrating when just starting with git).

If none of that suits you, you can just post your changes right to this ticket or use gist.github.com If you're interested in contributing to stuff in general, it's worthwhile to learn the workflow. If it's more of a one-off thing then it's fine to just post the changed files. :)

Thanks for all the info. I'll start with the "proper" git approach, then fall back to posting changes directly if I can't figure it out. Might take me a couple days. Honestly, this probably is a one-off for me. It's been a long time since I did much active coding, but I do write scripts from time to time to help me automate tasks for work. In this case, I stumbled upon the bug because I turned up a weird result in some web logs, and hey, I like a good puzzle!

Again, thank you for your help walking me through this. You have definitely committed me to submitting a fix :)

Sounds good! Good luck getting started. It's time well spent to learn some git. You may find it helpful when you're automating your tasks to keep them under version control.

Again, thank you for your help walking me through this. You have definitely committed me to submitting a fix :)

Mission accomplished then. ;)

I should have asked this before, but what would you classify this user agent as, a robot? In the course of my writing a fix, I discovered another funky agent in my logs for Windows WebDAV clients (
https://social.technet.microsoft.com/Forums/office/en-US/25118bb3-025b-43a4-9d51-760913a0cab4/microsoftwebdavminiredir-making-thousands-of-calls-to-a-couple-sharepoint-sites?forum=sharepointadminprevious).
Also a robot? Can robots have version numbers and OSes? Doesn't look like it from my scanning the code...

Good question. Bots can have version numbers and really whatever metadata the bot creator decides to add to the useragent string. Basically, if some automated process is fetching a page and not displaying it to a human, you can call that a bot.

Hi Olaf,

It's taken me a while to find time to finish this up, but I managed to add
a fix/detection for the initial agent as well as the WebDAV one discovered
incidentally. I've attached the updated files just so you have them and in
case you beat me to figuring out Git and making a patch there. All tests
passed, so I feel pretty confident things are in order.

Thanks again for the guidance and encouragement to get this far!

--Matt

On Wed, Mar 11, 2015 at 12:45 PM, Olaf Alders notifications@github.com
wrote:

Good question. Bots can have version numbers and really whatever metadata
the bot creator decides to add to the useragent string. Basically, if some
automated process is fetching a page and not displaying it to a human, you
can call that a bot.


Reply to this email directly or view it on GitHub
#100 (comment)
.

Hi @dafinder,

Thanks! I don't actually see any patches in this ticket. If you sent them via email, they probably got stripped. You could paste them using https://gist.github.com/ and then link to the gist from this ticket. Would that work?

Thanks for your efforts. :)

I did try this via email, guess it ate them. I took your suggestion and
made one for each:

BrowserDetect.pm : https://gist.github.com/anonymous/d06c42e5da55e5cfadde
useragents.json : https://gist.github.com/anonymous/578135748a40b5030f24

On Wed, Apr 1, 2015 at 2:04 PM, Olaf Alders notifications@github.com
wrote:

Hi @dafinder https://github.com/dafinder,

Thanks! I don't actually see any patches in this ticket. If you sent them
via email, they probably got stripped. You could paste them using
https://gist.github.com/ and then link to the gist from this ticket.
Would that work?

Thanks for your efforts. :)


Reply to this email directly or view it on GitHub
#100 (comment)
.

Thanks. Which commit was your branch at when you made these changes?

I'm not sure; I didn't pull this from Git, I worked against the lastest
version off CPAN (which is probably wrong). This was done against the 1.77
version released 2015-03-08 12:06:17.

If I'm making this too complicated, I would still like to get my feet wet
with Git and can start over directly from there.

On Wed, Apr 1, 2015 at 2:15 PM, Olaf Alders notifications@github.com
wrote:

Thanks. Which commit was your branch at when you made these changes?


Reply to this email directly or view it on GitHub
#100 (comment)
.

It's not too complicated, but the issue at this point is just that some really invasive (but good) changes have been made in the meantime. If you'd be able to work your changes into the latest release (or latest master) that would speed up the time it takes to merge your code in. That's if you have the time to spend on it. I don't want to make your life miserable when you're trying to help. :)

Haha not miserable, a good learning experience. I'm the one feeling bad it
took me 3 weeks to get this far. What I did was pretty straightforward and
I could quickly replicate, so I'll get the latest release (latest at the
point I have time again), merge my changes in, and give it another go.

On Wed, Apr 1, 2015 at 2:41 PM, Olaf Alders notifications@github.com
wrote:

It's not too complicated, but the issue at this point is just that some
really invasive (but good) changes have been made in the meantime. If you'd
be able to work your changes into the latest release (or latest master)
that would speed up the time it takes to merge your code in. That's if
you have the time to spend on it. I don't want to make your life miserable
when you're trying to help. :)


Reply to this email directly or view it on GitHub
#100 (comment)
.

Excellent! 3 weeks is actually pretty good. I proposed a PR to someone else's repo last December and I still haven't gotten to it. No judgement here. ;)

hi @oalders and @dafinder , I took a look at @dafinder's suggestions regarding the useragent "Microsoft Office Existence Discovery".

I tried comparing the changes between the gist, the earlier base version 1.77 and the latest master. I think that the relevant changes have been incorporated correctly (and dzil test doesn't seem to complain), so I have gone ahead and made a pull request ( #114 )

Let me know what you guys think.

Thanks @dafinder and @ramananbalakrishnan :)