dotnet/runtime

Uri.IsWellFormedOriginalString false negatives

MihaZupan opened this issue · 13 comments

IsWellFormedOriginalString (and therefore also IsWellFormedUriString) will report false negatives in the following cases:

  • Case 1
    • Occurs if the Uri contains both
    • This happens because EscapeUnescapeIri removes the escaping from these characters, but CheckCanonical later reports that they should have been escaped (which they were, but we unescaped them)
    • Example: http://a/%41%22 (%41 is A (unreserved) and %22 is ")
  • Case 2
    • Occurs if the Uri contains both
      • An (escaped or not) non-ASCII character (except those outside the IRI range)
      • An escaped character from the [0, 1F] range, the ;/?:@&=+$,#[]!'()*%\ set (reserved + %, \), the " <>`^ set or outside the IRI range
    • This happens because EscapeUnescapeIri removes some escaping and CheckCanonical then sees a mismatch of escaped, unescaped and non-ASCII characters
    • Example: http://a/ű%20 (ü is non-ASCII and %20 is )

Case 1. occurred in #70929.
Case 2. occurred in #21626, #34031, #37634, #64249, VS Feedback.

My suggestion as a workaround is to not use IsWellFormedOriginalString at all. Odds are that in most cases you don't care about the sort of validation that it does anyway.
Consider using something like this instead:

static bool IsValidHttpUri(string uriString, out Uri uri) =>
    Uri.TryCreate(uriString, UriKind.Absolute, out uri) &&
    (uri.Scheme == Uri.UriSchemeHttp || uri.Scheme == Uri.UriSchemeHttps);

Tagging subscribers to this area: @dotnet/ncl
See info in area-owners.md if you want to be subscribed.

Issue Details

IsWellFormedOriginalString will report false negatives in the following cases:

  • Case 1
    • Occurs if the Uri contains both
    • This happens because EscapeUnescapeIri removes the escaping from these characters, but CheckCanonical later reports that they should have been escaped (which they were, but we unescaped them)
    • Example: http://a/%41%22 (%41 is A (unreserved) and %22 is ")
  • Case 2
    • Occurs if the Uri contains both
      • A (not escaped) non-ASCII character (except those outside the IRI range)
      • An escaped character from the [0, 1F] range, the ;/?:@&=+$,#[]!'()*%\ set (reserved + %, \), the " <>`^ set or outside the IRI range
    • This happens because EscapeUnescapeIri removes some escaping and CheckCanonical then sees a mismatch of escaped, unescaped and non-ASCII characters
    • Example: http://a/ű%20 (ü is non-ASCII and %20 is )

Case 1. occurred in #70929.
Case 2. occurred in #21626, #34031, #37634, #64249.

Author: MihaZupan
Assignees: MihaZupan
Labels:

bug, area-System.Net, tenet-compatibility

Milestone: -

Triage:

  • The method should NOT be called at all in 99.9% cases. Uri.TryCreate will be sufficient for all of these 99.9% cases.
    • We have hard time to imagine almost ANY scenario where IsWellFormedOriginalString is needed to be called by the user explicitly. We believe most of the cases use it just because it looks like a good thing to call.
    • One of the few values of IsWellFormedOriginalString is the fact it rejects implicit file path (e.g. c:\\mydirectory)
    • The other potential value might be checking for RFCs (as documented) and then pass the original (readable-form string) to database or other applications.
  • Given there are valid use cases (though sparse), we should not Obsolete the API
    • It would be nice to learn from the 25-something upvotes on various duplicates - why they tried to use the API and if they fall int the 99.9% category above - @MihaZupan
    • Overall, we believe that the real IMPACT of having this method doing the right thing is MEDIUM as there are few users who really need it and the workaround (Uri.TryCreate) is not sufficient for them. However, users who use it for no good reason may be getting wrong answers sometimes, causing them pain and wasted time in investigations.
  • COMPATABILITY: .NET Framework behaves at least slightly better in some cases than .NET Core. So, it may be a .NET Framework -> .NET Core migration complication for customers.
    • It is possible that .NET Framework does not behave 100% correctly either. In .NET Core we should shoot for aligning with RFCs as described in documentation of this API.
    • RISK (compat break): SMALL - if the method does what it describes it should, it will be much better than the randomly broken/unpredictable behavior now. We have hard time imagine that anyone could depend on the current state.
  • RISK (of implementation): MEDIUM ... fixing the method means touching core Uri parsing code, which is always risky (based on our past experience).
  • COST: approx. 1 week (the implementation will be a bit tricky).

Given all the above, it is not worth it to fix it this late of 7.0 release (non-trivial risk, non-trivial cost, medium-benefit).
We should address it early in 8.0 to get bake time due to risk of the change.

@karelz @MihaZupan All 4 of those issues closed "in favor of" this one are referring to Uri.IsWellFormedUriString, not Uri.IsWellFormedOriginalString, so just to be clear they are not duplicates. I'm not sure if the rationale in the triage comments above applies to both or not:

We have hard time to imagine almost ANY scenario where IsWellFormedOriginalString is needed to be called by the user explicitly. We believe most of the cases use it just because it looks like a good thing to call.

I'm less familiar with IsWellFormedOriginalString, but IsWellFormedUriString is static and the use case is fairly obvious: validation. This is the one people are complaining about, and I would suggest you re-open one of the 4 that were closed, or at least provide different rationale for closing them. I'd suggest reopening #21626 as it has the most activity and upvotes. Thanks.

The two methods share the same issues. IsWellFormedUriString is implemented as

Uri.TryCreate(uriString, uriKind, out result) && result.IsWellFormedOriginalString()

IsWellFormedUriString is static and the use case is fairly obvious: validation.

Right, but was the intended validation exactly what IsWellFormedOriginalString is doing, or whether a Uri seems valid and can be created via Uri.TryCreate?

Thanks for the clarification. Karel's comment that "it looks like a good thing to call" is certainly true in the case of IsWellFormedUriString. Without that knowledge of the implementation details, it seems appropriate when you just want to make sure a user-entered URI string is valid and you don't need a Uri object in the moment. (Perhaps someone is entering their web site URL in a form and you just want validate and save it, for example.)

I just think the general tone that "we can't imagine why anyone would ever call this" is likely much more applicable to IsWellFormedOriginalString than it is to IsWellFormedUriString....again, without knowledge of the implementation. :)

Just to give some context on how people could end up experiencing this problem:

Microsofts OData.NET library uses Uri.IsWellFormedUriString (matching issue)
In my case it is used to talk to Microsofts Business Central webservices which contain tenant names in their URL that can contain non ASCII characters, spaces etc.. This issue is specific to the library and could be fixed with a workaround but has a big impact on a fraction of users in the meantime.

Example URL:
"http://192.168.0.1:1234/Instance/ODataV4/Company('123-Customer Place Süd-Ost')/"

Thanks for linking the issue. From the linked code, it does seem like this is one of those usages that fall into the We believe most of the cases use it just because it looks like a good thing to call. category.

The OData library should consider dropping this check entirely. As-is, it is checking Uri.IsWellFormedUriString(baseUri.AbsoluteUri, UriKind.Absolute)
which should just always be true, as Uri.AbsoluteUri is always generating a well-formed string.
That is, this check is doing nothing in this case, except for potentially hitting this false-negative bug.

@karelz

I am not sure if I should open a new issue (this one looks similar enough), but Uri.TryCreate seems fundamentally broken to me.

Minimal repro (tested on .Net Framework 4.8):

Uri.TryCreate(@"<li>No certified downloads were found for this configuration. To include beta downloads in your search, click <a href='Find.aspx?lang=en-us'>here</a>.</li>", UriKind.RelativeOrAbsolute, out Uri Result);

To my surprise that call returns true, and the result is... well, see for yourself:
image

This issue is only talking about IsValidOriginalString, so it's unrelated to your problem.

If you try to create a relative Uri, we'll accept most everything as valid input. If you don't want that, you should likely specify UriKind.Absolute instead.

This issue is only talking about IsValidOriginalString, so it's unrelated to your problem.

Thanks for clarification, I will create a new issue.

If you try to create a relative Uri, we'll accept most everything as valid input. If you don't want that, you should likely specify UriKind.Absolute instead.

What should I do if I want both absolute or relative? Roll my own check? I mean, why is there even an option for relative Uri TryCreate when there's absolutely no validation for it? I'd expect at least to check against allowed character list and return false for those that aren't (taking also invalid UTF-8 encodings into account).

Just for reference:
Uri.IsWellFormedUriString(@"<li>No certified downloads were found for this configuration. To include beta downloads in your search, click <a href='Find.aspx?lang=en-us'>here</a>.</li>", UriKind.RelativeOrAbsolute);
Returns False, as it should.

@julian94 I understand.

However, the documentation for Uri.TryCreate does not say what kind of validation (if any) is performed when constructing an Uri. At the minimum, this should be clarified. I also believe that Uri.TryCreate should not succeed in creating an invalid Uri (relative or absolute).

I did create an issue just in case someone would like to close it with wontfix with extreme prejudice :-)
#78381

karelz commented

Sadly, we won't make it in 8.0 due to circumstances. Moving to Future for now.