marrow/uri

Rootless file URI does not round trip

Closed this issue · 10 comments

timj commented

RFC-3986 talks about rootless path URIs.
I interpret this to mean that a URI of file:relative/path.ext is a perfectly reasonably URI indicating a relative path. Unfortunately this does not round trip:

>>> u = uri.URI("file:relative/path.ext")
>>> u
URI('file://relative/path.ext')
>>> u.path
PurePosixPath('relative/path.ext')
>>> u.uri
'file://relative/path.ext'
>>> u2 = uri.URI(u.uri)
>>> u2.path
PurePosixPath('/path.ext')
>>> u.hostname
>>> u2.hostname
'relative'

So it turns a relative URI into an absolute URI with a hostname. Am I misinterpreting RFC-3986? (I noticed this because I have some relative paths that I need to represent as URIs).

This may very well partially be a duplicate of #9. Please reference this comment, specifically. I also note that your MCVE (test case) does not ever actually define the variable u2, making it incomplete.

Edited to add relevant ABNF:

   URI           = scheme ":" hier-part [ "?" query ] [ "#" fragment ]

   hier-part     = "//" authority path-abempty
                 / path-absolute
                 / path-rootless
                 / path-empty

   URI-reference = URI / relative-ref

   absolute-URI  = scheme ":" hier-part [ "?" query ]

   relative-ref  = relative-part [ "?" query ] [ "#" fragment ]

   relative-part = "//" authority path-abempty
                 / path-absolute
                 / path-noscheme
                 / path-empty
timj commented

Sorry about the lack of u2 -- I've fixed the original message (copy and paste error).

timj commented

Reading the comment referenced above in #9, I agree with the last line of the first paragraph (it doesn't round trip properly). The final comment on #9 seems to be that file:relative/path.ext is forbidden by the RFC and is not in fact described by the path-rootless section of the documentation. The odd thing is that this package is parsing them correctly, but not putting them back together consistently.

Thank you for the updated test case; I'll get this integrated into the test suite proper and make sure roundtrips behave sanely. Specifically, examining the BNC closely, it appears that without the // marker, there is no authority part, vs. URI's current behaviour of treating everything up to the first / (after the first :, ignoring //) as authority.

As per the notes (and "final weigh-in") from this comment on #9, and the RFC specification for the file: URI scheme (BNF copied below), "relative" paths are not permitted.

  file-URI       = file-scheme ":" file-hier-part

  file-scheme    = "file"

  file-hier-part = ( "//" auth-path )
                 / local-path

  auth-path      = [ file-auth ] path-absolute

  local-path     = path-absolute

  file-auth      = "localhost"
                 / host

Identity transform round-trips for structurally invalid values can not be assured; I will be adding the capability for Scheme implementations to perform validation such that a warning can be issued if an attempt is made to utilize a path-relative component. (On a custom FileScheme, not the base URLScheme, given it's scheme-specific behaviour.)

timj commented

Thank you for pointing me at RFC8089. I agree that that says that relative paths are not supported in file scheme. RFC3986 doesn't care but RFC8089 overrides that.

timj commented

Would it be possible to issue a warning if a URI string is being created from a file relative path so that the round trip failure is more obvious?

Would it be possible to issue a warning if a URI string is being created from a file relative path so that the round trip failure is more obvious?

Absolutely; that is what my last comment was trying to state is my plan. It shouldn't outright fail (since there does seem to be a general sense that //-omitting file: is "acceptable" to humans, but it certainly should issue a warnings.warn.

timj commented

Great. Just to confirm what I think you are saying, file:relative/path.ext should be accepted as now (as also happens with furl and urllib) but if someone asks for the URI string to be reconstructed a warning should be issued because now it has taken a relative path and converted it to an absolute path.

More eager than that: any attempt to construct a file: URI without a rooted path will emit a warning at instantiation time unless that warning is explicitly silenced. I like to notify people of incoming foot-shots before they pull the trigger. ;)