microsoft/Office-Online-Test-Tools-and-Documentation

wdl* parameters getting appended in the middle of the response location header and causing a unauthorized error

raysaik opened this issue · 8 comments

I have a tool that talks with office online via WOPI and is working a bit odd recently. This happens when the user, trying to open a document in office, is authenticated for the first time to O365. This does not happen (works perfectly) in the subsequent calls.

What we have observed is that - during the first request, one of the GET request is as below -

https://office.live.com/start/Word.aspx?h4b={companyName}&c4b={int}&eurl=https://{{our server}}/#/microsoft-office/{{docid}}&hp={{base64encodedstring}}

As we can see here the "eurl" is "https://{{our server}}/#/microsoft-office/{{docid}}&hp={{base64encodedstring}}" but the in th response location header it is "https://{{our server}}/?wdlcs={{string}}&wdlcsexp={{int}}#/microsoft-office/{{docid}}" -- wd* parameters gets appended in between the url

image

The two wdl query parameters are injected into the middle of the callback url rather than at the end of it. I can see a trailing "#" character after the wdl parameters (both in request and response), not sure if thats the problem here.

Howd @raysaik,

I don't think # is generally considered a valid URL character. Quoting from RFC 1738 - Uniform Resource Locators: "The character "#" is unsafe and should always be encoded because it is used in World Wide Web and in other systems to delimit a URL from a fragment/anchor identifier that might follow it."

Query parameters on a URI should appear before the fragment. Can you retry your repro after encoding the # using its URI safe representation of %23?

Best regards,
Rob Rolnick

Thanks Rob for responding to this.
Sorry i didn't mention, actually the characters are encoded, i just shared it as decoded for the ease of understanding. Just to reiterate the problem is(as per the picture below from browser developer tools) is that the 302 redirect response location header has the wdl query string embeeded oin the middle of the url , rather than at the end. This is happenning from the WOPI side so we want to know why the parameters are getting appended in the middle

image

Hi @raysaik ,

Is it single or double encoded? I'm certainly not 100% confident, but I wonder if they need to be double encoded. Once to represent a # in a URL, and again to represent that URL as a query string parameter.

Best,
Rob Rolnick

Hi @RobRol

i do not think these are double encoded.

In the first url there is query param called "eurl" which has the follwoing value

eurl=https://abcd.pqrs.com/#/microsoft-office/000000&hp=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

which is supposed to be the callback url. However when we are calling to "office.live.com" with this particular callback url, the response location has some wdl* parameters right in the middle of the url.

Is

https://abcd.pqrs.com/?wdlcs=XXXXXXXXXXXXXXXXXXXXX%3D&wdlcsexp=xxxxxxxxxxxxxxxxxxx#/microsoft-office/00000000000

Expected

https://abcd.pqrs.com/microsoft-office/00000000000?wdlcs=XXXXXXXXXXXXXXXXXXXXX%3D&wdlcsexp=xxxxxxxxxxxxxxxxxxx#

Hi @raysaik ,

I'm speculating that there is something going wrong in the way the eurl parameter is being written to the URL. Either the initial input is malformed and didn't encode the #, or a secondary encoding should be happening that was missed after the URL was assigned to the eurl argument. I'd further hypothesize that this would likely be a much more widespread issue if the problem occurred post assignment, so most likely the input value is originally malformed.

Both of those theories COULD be wrong though. The easiest way to test it is to ensure that the URL assigned to eurl is correctly double encoded. (This can be done using tools like Fiddler that let you intercept and edit the web traffic from your local box. It is not the only such tool, just one I happen to be familiar with.) If manually forcing the double encoding fixes the issue, then we can work backwards to determine where the problem originally manifested. For example, if double encoding does fix the problem, then one possible next step is to figure out where the eurl is initially coming from and work from there. If it doesn't fix the issue, that's great too. We can explore other speculative avenues. But this theory seems very promising to me.

The reason I think the issue is with the double encoding is because that would exactly match the issue you are seeing. In order to 302, the eurl parameter first needs to be read (and decoded) off the Word.aspx page's URL. In doing that, %23 becomes #. Anything that follows the # would be a URL fragment per RTC 1738. Any query string parameters must be added before the fragment. So adding them in the spot you described would be the expected behavior in that case. It might not be desirable if you want to preserve the # in your URL string but I believe ensuring that it remains encoded as %23 even after decoding the eurl parameter would account for that.

Best regards,
Rob Rolnick

Thanks @RobRol . That sounds reasonable.. will try out an inform.

Thanks @RobRol
Using developer tools i tried to send the requests without the "#" character and i cannot see the error anymore, so i am also pretty much convinced that this "#" character is causing the error. While unearthing this, i found that this is not an encoding issue rather (since we have an Angular based front end) its part of the routing mechanism , something called as "hash based routing" in Angular. I will try out some options (although I am not much of a front end developer myself) but if you have already come across any similar wopi issues with Angular before please let me know.

Thanks for confirming @raysaik ,

I am not familiar with Angular, unfortunately. However I did some digging and it seems like there are conflicting perspectives on the topic. On the one hand the RFC 3986 - Uniform Resource Identifier Generic Syntax makes the structure of a URI clear including the ordering of the URI's segments and what characters are valid in each one. This StackOverflow post summarizes it well. The important piece is that a # is not a valid character for the URI's path.

On the other hand, I found a post titled Issue:36688 - Querystring removed when useHash is true on the Angular GitHub. It's from April of last year, and seems to cover this very topic. In that issue the same RFC is mentioned saying that the path needs to appear before the query string. If I'm reading the conversation correctly, it seems to ignore the fact that # is not valid in the path per the same RFC. Maybe poke them again to see if I'm misunderstanding the claim? They do offer a suggestion in that thread, but I'm not sure how much work it would be.

Sorry I can't be more help. Angular is outside my wheelhouse.

Best regards,
Rob Rolnick