marrow/uri

URI with empty path

Wouter1 opened this issue · 9 comments

From the wiki ​https://en.wikipedia.org/wiki/Uniform_Resource_Identifier

If an authority component is present, then the path component must either be empty or begin with a slash (/).

So this should be correct uri
https://localhost

(notice no slash at the end)

However

>>> URI("http://local")
URI('http://local/')

Indeed, path-empty is a BNF form not currently accounted for. 🤔

Looking at practical implications, the encoding (REPR, in this case) containing the slash appears to be a result of the attempt to form the hierarchical part by naively combining authority with path, and "resolving" that path. The path of a URI having no path is, in fact, distinct:

>>> URI('http://example.com')
<<< URI('http://example.com/')

>>> URI('http://example.com').path
<<< PurePosixPath('.')

Resolution of . results in / given no other base path. I agree that this is bug-worthy; a special case should be added for this "pathless" capability. (I.e. if the path is exactly ., there is no path.)

Conversely, the paths you have given as examples are the same URI after absolutizing. The first bullet point case below this section covers this situation, I believe.

Yes after absolutizing it might be the same.
I can not exactly predict if and when this difference would play up or how relevant this is for me. It played up in a bit artificial unit test here so I could work around it. But also I prefer to not wait for bug reports from my users.

Additional note: the relative reference resolution examples appear to strongly prefer that if a path has ever existed, that a root must be preserved:

"../.."         =  "http://a/"
"../../"        =  "http://a/"

I would not draw conclusions about preferences from examples. Better stick with the specification

Examining the code, you can "hotfix" patch this behavior yourself if required:

from uri.part.path import PathPart

PathPart.empty = ""

That will swap out the default representation for an "empty" path globally:

>>> URI('http://example.com')
<<< URI('http://example.com')

>>> URI('http://example.com/foo')
<<< URI('http://example.com/foo')

I may have already thought about this, then forgotten. 😜 Edit to note: my desire has been to follow the specification as closely as is reasonable, while making sensible accomodations for user-expected behavior, i.e. as experienced in a browser user agent. In a majority of the browsers I utilize, if you omit the trailing / representing the path, it's added automatically since you actually are requesting the root of the domain. (path-empty is equivalent to / for practical reasons.)

Better stick with the specification.

Double check where these examples are literally coming from.

Am I correct that with your hotfix, the issue would reappear the other way round? so that http://local/ would become http://local ?

Double check where these examples are literally coming from

These are examples from the spec. So ?

Am I correct that with your hotfix, the issue would reappear the other way round? so that http://local/ would become http://local ?

Nope, but that's a trivial one to try out:

>>> URI('http://example.com/')
<<< URI('http://example.com/')

The adaption is explicit, as mentioned, targeting the PurePosixPath('.') case of a URI omitting any path. Edit to be clear: / is not a path-empty.

Ok good! Sounds like the way to go.
Is this fix going to be released or put into the develop version as well?
Because selling my users a "hot-fixed" build will raise more questions than selling them the develop version.

This will require discussion/debate; changing default behaviours is not often an easy decision. Which to target: pure specification, or practical application? The Zen of Python does have something to say about this, and alteration of behavior to match specific desires is demonstrably quite simple. Explicitly why I structured the representation into "parts" the way I have.

Consider it less of a "hotfix" than configuration, if you need to. This configuration also only alters behavior for your own application process; it merely isn't passed in as part of the constructor itself, it's process-global. (It's not like monkeypatching code paths—this is just the alteration of a global constant dedicated for this explicit purpose!)