URI with empty path
Wouter1 opened this issue · 9 comments
From the wiki https://en.wikipedia.org/wiki/Uniform_Resource_Identifier
If an authority component is present, then the path component must either be empty or begin with a slash (/).
So this should be correct uri
https://localhost
(notice no slash at the end)
However
>>> URI("http://local")
URI('http://local/')
Indeed, path-empty
is a BNF form not currently accounted for. 🤔
Looking at practical implications, the encoding (REPR, in this case) containing the slash appears to be a result of the attempt to form the hierarchical
part by naively combining authority with path, and "resolving" that path. The path of a URI having no path is, in fact, distinct:
>>> URI('http://example.com')
<<< URI('http://example.com/')
>>> URI('http://example.com').path
<<< PurePosixPath('.')
Resolution of .
results in /
given no other base path. I agree that this is bug-worthy; a special case should be added for this "pathless" capability. (I.e. if the path is exactly .
, there is no path.)
Conversely, the paths you have given as examples are the same URI after absolutizing. The first bullet point case below this section covers this situation, I believe.
Yes after absolutizing it might be the same.
I can not exactly predict if and when this difference would play up or how relevant this is for me. It played up in a bit artificial unit test here so I could work around it. But also I prefer to not wait for bug reports from my users.
Additional note: the relative reference resolution examples appear to strongly prefer that if a path has ever existed, that a root must be preserved:
"../.." = "http://a/"
"../../" = "http://a/"
I would not draw conclusions about preferences from examples. Better stick with the specification
Examining the code, you can "hotfix" patch this behavior yourself if required:
from uri.part.path import PathPart
PathPart.empty = ""
That will swap out the default representation for an "empty" path globally:
>>> URI('http://example.com')
<<< URI('http://example.com')
>>> URI('http://example.com/foo')
<<< URI('http://example.com/foo')
I may have already thought about this, then forgotten. 😜 Edit to note: my desire has been to follow the specification as closely as is reasonable, while making sensible accomodations for user-expected behavior, i.e. as experienced in a browser user agent. In a majority of the browsers I utilize, if you omit the trailing /
representing the path, it's added automatically since you actually are requesting the root of the domain. (path-empty
is equivalent to /
for practical reasons.)
Better stick with the specification.
Double check where these examples are literally coming from.
Am I correct that with your hotfix, the issue would reappear the other way round? so that http://local/
would become http://local
?
Double check where these examples are literally coming from
These are examples from the spec. So ?
Am I correct that with your hotfix, the issue would reappear the other way round? so that http://local/ would become http://local ?
Nope, but that's a trivial one to try out:
>>> URI('http://example.com/')
<<< URI('http://example.com/')
The adaption is explicit, as mentioned, targeting the PurePosixPath('.')
case of a URI omitting any path. Edit to be clear: /
is not a path-empty
.
Ok good! Sounds like the way to go.
Is this fix going to be released or put into the develop version as well?
Because selling my users a "hot-fixed" build will raise more questions than selling them the develop version.
This will require discussion/debate; changing default behaviours is not often an easy decision. Which to target: pure specification, or practical application? The Zen of Python does have something to say about this, and alteration of behavior to match specific desires is demonstrably quite simple. Explicitly why I structured the representation into "parts" the way I have.
Consider it less of a "hotfix" than configuration, if you need to. This configuration also only alters behavior for your own application process; it merely isn't passed in as part of the constructor itself, it's process-global. (It's not like monkeypatching code paths—this is just the alteration of a global constant dedicated for this explicit purpose!)