dart-lang/path

Path and Dart SDK disagree about Windows behavior of absolute file URIs without drive names

Closed this issue · 18 comments

The following script illustrates the problem:

import 'package:path/path.dart' as path;

void check(Uri uri) {
  print('For uri: $uri');
  print('  uri.toFilePath(windows: true) => "${uri.toFilePath(windows: true)}"');
  print('  path.windows.fromUri(uri) => "${path.windows.fromUri(uri)}"');
}

main() {
  check(Uri.parse('file:///foo'));
  check(Uri.parse('/foo'));
}

This produces the following output:

For uri: file:///foo
  uri.toFilePath(windows: true) => "\foo"
  path.windows.fromUri(uri) => "foo"
For uri: /foo
  uri.toFilePath(windows: true) => "\foo"
  path.windows.fromUri(uri) => "foo"

I realize that these file URIs aren't especially Windows-like, but it would be nice if the SDK and the path package treated them in the same way. I tend to favor the behavior of uri.toFilePath(windows: true), because at least it produces something that doesn't look like a relative path.

This bug is at the root of a unit test failure in analyzer--see dart-lang/sdk#27870.

@nex3 I'm not sure if you're the correct person to assign this to. Feel free to reassign.

Yes, I think Uri.toFilePath() is doing the right thing and path is wrong. A path with a leading backslash is a valid absolute path in the current drive in Windows.

nex3 commented

tl;dr: The URI in question is invalid, so neither implementation is really right or wrong. I think path's behavior makes a little more sense, but that's pretty subjective.

The generalized format of a file: URI on Windows is file://$hostname/$path, where $path is relative to the root of the given host. The hostname may be empty to refer to paths on the local machine, in which case the URI has the format file:///$drive/$localpath, where $localpath is relative to the root of the drive.

The URI file:///foo has an empty hostname, indicating that it refers to a local path, but where the drive name is expected foo is found instead. Because foo isn't a valid name for a drive on Windows, the URI is itself invalid.

I like path's behavior here because it's symmetrical to the correct output when an actual drive name is given. For example, the URI file:///c:/ translates to the path c:\. But again, since the input is invalid, the output is more or less up to the discretion of the implementation.

And yet the VM supports these URIs, and analyzer needs to as well so that it can match the VM's behavior. So as I see it we have two choices: (a) change the behavior of path to match the VM so that analyzer can match the VM's behavior, or (b) change the analyzer to stop using this feature of path since it doesn't meet analyzer's needs.

I would far rather change path to meet analyzer's needs. It seems consistent with our philosophy of trying to create an ecosystem of packages that are usable by clients, and dogfooding those packages by using them ourselves.

The generalized format of a file: URI on Windows is file://$hostname/$path,

Yes, this is the spec for file: URIs in general, not just on Windows.

where $path is relative to the root of the given host.

I can't find anything that specifies how the path should be interpreted. As far as I know, it's up to the platform, and I couldn't find anything particularly precise around Windows.

in which case the URI has the format file:///$drive/$localpath, where $localpath is relative to the root of the drive.

That is a valid Windows file URI, but I can't find any docs saying file URIs must look like that. I think a more accurate description is that any Windows path is allowed after the (empty, here) hostname and /. A path without a leading drive letter is also a valid Windows path.

It is true that file:///foo looks like a file URI with no hostname and a path of foo, not /foo. But I think the user probably intends it to be the Windows path /foo since relative URIs don't usually have any scheme at all.

nex3 commented

@stereotype441 I don't think I buy the claim that the analyzer needs to match the VM's behavior for invalid inputs. What guarantee do you have that the VM's behavior will even remain consistent over time? Does it have tests for all kinds of invalid URIs? Does the analyzer?

In general, the path package's remit is correctness, not matching behavior with any other system. I'm pretty sure the VM's testing of its path support, if it exists, is less comprehensive than paths. Also, path manipulation is hard. This strongly suggests that this is not the only place that path's behavior differs from the VM's, and that the VM is wrong where path is right. I think a better solution is to make the VM match path's behavior rather than the reverse.

@munificent

Yes, this is the spec for file: URIs in general, not just on Windows.

While this is true, POSIX paths don't support hostnames, so as far as converting from URIs to paths goes this is only relevant on Windows.

I can't find anything that specifies how the path should be interpreted. As far as I know, it's up to the platform, and I couldn't find anything particularly precise around Windows.

Yes, Microsoft doesn't provide any kind of spec for how Windows interprets file: URIs and what it considers valid, so we can only infer it from the intent of the spec and from clear examples. In this case, the only examples that exist in official and unofficial documentation for absolute, non-UNC Windows file: URIs include drive letters. It's reasonable to conclude that that's a requirement.

To verify this, I just booted up a Windows VM and tested it manually in IE. It consistently rejected URLs of the form file:///$relative_path, whether a path with the given name existed in the working directory or in the current drive.

That is a valid Windows file URI, but I can't find any docs saying file URIs must look like that. I think a more accurate description is that any Windows path is allowed after the (empty, here) hostname and /. A path without a leading drive letter is also a valid Windows path.

As far as I know, there are no other cases where a fully-qualified file: URI depends on the working directory to be resolved correctly—if you're arguing for an exception to this rule, I think the burden of evidence lies heavily on you. Do the docs (such as they are) there any cases where relative or root-relative paths are used in fully-qualified file: URIs?

It is true that file:///foo looks like a file URI with no hostname and a path of foo, not /foo. But I think the user probably intends it to be the Windows path /foo since relative URIs don't usually have any scheme at all.

I'm not comfortable with using user intention as a justification for path semantics. This might be even more true for invalid inputs, where doing something "mostly right" can lead the user to incorrectly believe those inputs are correct and run into problems when undefined behavior changes or when other programs are added to the mix.

@stereotype441 I don't think I buy the claim that the analyzer needs to match the VM's behavior for invalid inputs.

But as @munificent pointed out earlier, import "/a.dart"; is valid, and it has a clear and sensible interpretation on Windows (which the VM implements correctly), and analyzer's behavior doesn't currently match the VM's behavior for this URI, due to this bug.

What guarantee do you have that the VM's behavior will even remain consistent over time?

That's not relevant. We should strive to make both the VM and the analyzer implement the same correct behavior. The fact that a bug might be introduced that makes the VM behavior incorrect in the future doesn't mean that analyzer shouldn't match the VM's correct behavior now.

Does it have tests for all kinds of invalid URIs?

I don't know how comprehensive the VM's testing of its URI implementation is. If the VM's testing URIs is insufficient, that's a bug in the VM and not relevant to the discussion of how one particular URI should be handled.

Does the analyzer?

No. Exhaustive tests for invalid URIs belong in the implementation of the URI handling library that analyzer uses. However, analyzer does have a test for this particular valid case (import "/a.dart";), which is currently failing due to this bug.

In general, the path package's remit is correctness, not matching behavior with any other system.

And analyzer's remit is to match the behavior of the VM. So how do we proceed from here? As I stated previously, I would far rather we fix the bug in path, so that we can continue eating our own dogfood. But if that's not feasible, we can always work around the bug by changing analyzer to use uri.toFilePath.

nex3 commented

But as @munificent pointed out earlier, import "/a.dart"; is valid, and it has a clear and sensible interpretation on Windows (which the VM implements correctly), and analyzer's behavior doesn't currently match the VM's behavior for this URI, due to this bug.

I've tested IE's behavior some more, and it does seem like it interprets the URL /a.dart as relative to the drive rather than the root. In other words, following Windows semantics, p.join("file:///C:/foo/bar", "/a.dart") would return "file:///C:/a.dart". However, this entirely contradicts the relative URL spec.

This presents a serious problem. "file:///C:/foo/bar" is also a valid POSIX file: URI, and we don't have an OS context when we're dealing with URLs. This means that the correct output of p.join("file:///C:/foo/bar", "/a.dart") depends on information we don't have: which OS that URI in particular refers to.

I can see arguments for resolving this either way. In all other cases we follow specified behavior when a specification exists, which would suggest keeping the current behavior. But it's much more likely for this to come up in a Windows context, so we could throw the POSIX edge case under the bus and support the Windows behavior.

Regardless, it does seem clear that p.fromUri("/foo") should be drive-relative on Windows, since we do have the OS context there.

That's not relevant. We should strive to make both the VM and the analyzer implement the same correct behavior. The fact that a bug might be introduced that makes the VM behavior incorrect in the future doesn't mean that analyzer shouldn't match the VM's correct behavior now.

If the VM supports file:///foo, then that's not correct behavior. This was my point in this comment, to which you replied "And yet the VM supports these URIs, and analyzer needs to as well so that it can match the VM's behavior".

I don't know how comprehensive the VM's testing of its URI implementation is. If the VM's testing URIs is insufficient, that's a bug in the VM and not relevant to the discussion of how one particular URI should be handled.

What I'm wondering is, if the details of how invalid URLs are resolved are so important to you, why aren't you also pushing hard on the VM to test it?

No. Exhaustive tests for invalid URIs belong in the implementation of the URI handling library that analyzer uses. However, analyzer does have a test for this particular valid case (import "/a.dart";), which is currently failing due to this bug.

path doesn't guarantee the behavior of invalid inputs in any direction, and follows the principle that testing undefined behavior makes tests fail without providing a useful signal.

And analyzer's remit is to match the behavior of the VM. So how do we proceed from here? As I stated previously, I would far rather we fix the bug in path, so that we can continue eating our own dogfood. But if that's not feasible, we can always work around the bug by changing analyzer to use uri.toFilePath.

If you intend to match incorrect or undefined behavior, then path won't work for you—if not now, then eventually.

nex3 commented

Actually, a better option for handling spec inconsistencies is probably by adding new styles. This models the fact that different contexts have different resolution logic. I'm thinking:

  • p.url is the current, spec-mode behavior.
  • p.windowsUrl is the Windows-specific behavior.
  • p.osUrl uses p.windowsUrl if the current OS is Windows, and uses p.url otherwise.

In other words, following Windows semantics, p.join("file:///C:/foo/bar", "/a.dart") would return "file:///C:/a.dart".

I don't think we need to do anything that magical. When resolving relative URIs, the existing behavior is probably fine and just discards the drive letter. If you're manipulating absolute file paths in URI space and you want to preserve a drive-relative path... uh... don't do that.

I think it's fine if the above returns file:///a.dart. Then, when we convert that to a file path, if the user gets /a.dart, they get what they want. Sure, if the current drive is "C" and the original absolute file URI was "D:/foo/bar", the "D" gets lost but I'm OK with that.

Looking back at the linked to commit, I see:

// Check the case where the URI being resolved is a "short absolute" uri
// (starts with a "/" but doesn't start with "file://").  Such URIs are not
// actually valid URIs but we still want to handle them in a way that's
// consistent with the behavior of Uri.

A URI that starts with "/" is perfectly valid. It is an absolute URI with no scheme or authority. FastUri says:

/**
 * Implementation of [Uri] that understands only a limited set of valid
 * URI formats, but works fast. In practice Dart code almost always uses such
 * limited URI format, so almost always can be processed fast.
 */

It sounds like you might be discovering that that limited set isn't as limited as you'd hoped. You could go about reimplementing all of the edge cases that path already handles, but you may find it less total effort to help make path fast in the places where it's slow than to make FastUri correct in the places where it's wrong.

I do think that it would be better for path to convert file:///foo to /foo when going from a URI to a Windows file path, but Natalie disagrees and I don't have time to hack on it.

Either way, the input path you're getting is weird. I mean, what the hell did the user have in mind when they imported "/a.dart"? Could we not just say, "Yeah, don't do that."?

nex3 commented

I don't think we need to do anything that magical. When resolving relative URIs, the existing behavior is probably fine and just discards the drive letter. If you're manipulating absolute file paths in URI space and you want to preserve a drive-relative path... uh... don't do that.

I think it's fine if the above returns file:///a.dart. Then, when we convert that to a file path, if the user gets /a.dart, they get what they want. Sure, if the current drive is "C" and the original absolute file URI was "D:/foo/bar", the "D" gets lost but I'm OK with that.

I'm very leery both of having the intended output of a well-defined operation be invalid, and of having the path package consume that invalid output and intentionally assign semantics to it based only on invalid output produced by path itself. I'd much rather require that users be explicit about which semantics they want in cases where it matters.

A URI that starts with "/" is perfectly valid. It is an absolute URI with no scheme or authority.

In the lingo of path, it's root-relative, which is kind of a weird state in between absolute and relative. In some senses it's much more like a relative path than an absolute one, since its referent depends on the base URL.

@munificent

A URI that starts with "/" is perfectly valid. It is an absolute URI with no scheme or authority. FastUri says:

Is there an issue that should be fixed in FastUri?
AFAIK it does not use package:path.

This test prints:

false
false
false
/a.dart
  void solo_test_parse_short_absolute() {
    Uri uri = FastUri.parse('/a.dart');
    expect(uri, isFastUri);
    print(uri.isAbsolute);
    print(uri.hasAuthority);
    print(uri.hasScheme);
    print(uri.path);
  }
nex3 commented

Whether a URI beginning with / should be considered "absolute" is kind of up to interpretation. path returns true because we judged that more often than not they were absolute enough as far as users were concerned, but it also has an isRootRelative() function to disambiguate.

nex3 commented

It looks like the WHATWG has a much clearer spec spec which obsoletes RFCs 3986 and 3987 and answers some of these questions. It doesn't specify how to convert from file URIs to paths, but it does resolve inconsistencies between Windows' behavior and the original RFCs.

In particular, after parsing a leading / in a file URL, the spec says:

If base is non-null, base’s scheme is "file", and base’s path first string is a normalized Windows drive letter, append base’s path first string to url’s path.

This is a (platform-independent) Windows drive letter quirk.

In other words, p.join("file:///C:/foo/bar", "/a.dart") should produce "file:///C:/a.dart" on all platforms. This doesn't match some browser's current behaviors (including Chrome on Linux), but @domenic tells me it's in the works.

Oh, groovy! We should support that then.

Is there an issue that should be fixed in FastUri?

No idea. You could try running the path package's URI tests against it and see how it does.