marrow/uri

Host/host:port and path aren't definitively separated by a slash

eode opened this issue · 7 comments

eode commented
>>> uri = URI('http://foo.com')
>>> uri.path = 'blah'
>>> str(uri)
http://foo.comblah

Expected: Either reject paths with an initial slash, or reject paths without one.

eode commented

btw -- nice work, overall!

eode commented

Actually, you can't reject paths with one. What an awkward state to be in. Is the extra slash in http://foo.bar//blah possibly meaningful?

I suppose there's no route to rejection. I guess expected behavior would be to implicitly add a slash, and if path is set manually with a slash, it results in a url like http://foo.com//blah.

Hmm; it is true that you can provide paths that have no root. Technically, your own code above is buggy, though the URI class is not helping. Specifically, assign /blah, not blah, or use the division helper:

>>> uri = URI('http://foo.com') # no path part at all
>>> uri.path
PurePosixPath('.')
>>> uri.path = '/foo'
>>> uri
URI('http://foo.com/foo')
>>> uri = URI('http://foo.com')

# "natural" syntax for "extending" the path / resolving relative references
>>> uri / 'foo'
URI('http://foo.com/foo')

# scheme + authority part, plus a path part = rooted path at that authority over that protocol
>>> URI('http://foo.com') / URI('bar/baz')
URI('http://foo.com/bar/baz')

I'll leave this open for now as a reminder that non-root paths should be rooted for presentation, or, if there is an authority present, to only accept rooted paths. foo/bar isn't a valid URI path, where /foo/bar is. Good catch!

Edited to add: non-rooted paths remain useful, at least for URI without other, earlier components defined. This would allow you to, for example, use the division operator to combine two URI, the second being a URI fragment. (The second having uri.relative == True.). Think of relative links on a page; those are very much fragments, usually of non-root paths, and part of the reason why the division operator was added, reference issue #3.

Edited to add: I'm not seeing double slashes if you assign a rooted path. How are you getting that as a result?

>>> from uri import URI
>>> uri = URI('http://foo.com')  # not a fragment; has an authority
>>> uri.path = '/foo'; uri  # rooted path assignment is OK
URI('http://foo.com/foo')
>>> uri.path = 'foo'  # rootless path assignment is NOT OK

ValueError                                Traceback (most recent call last)
<ipython-input-3-d765cb2bdfb3> in <module>()
----> 1 uri.path = 'foo'

~/Projects/marrow/web/uri/uri/part/path.py in __set__(self, obj, value)
     28 
     29                 if obj.authority and not value.startswith('/'):
---> 30                         raise ValueError("Can only assign rooted paths to URI with authority.")
     31 
     32                 super(PathPart, self).__set__(obj, value)

ValueError: Can only assign rooted paths to URI with authority.

>>> uri = URI('http://foo.com')
>>> uri2 = URI()  # let's construct a fragment this time
>>> uri2.path = 'foo'; uri2
URI('foo')
>>> uri.resolve(uri2)  # relative reference resolution against an authority becomes rooted
URI('http://foo.com/foo')

Available in 84e0787 for testing prior to release. (Tests added in 74a16f6.)

eode commented

Marking as resolved. (Rejection is preferable to me; from the Zen of Python: errors should never pass silently, esp. as if you aren't expecting a relative path it might have security implications.). Edited to add: URI are not just URL. This is not a URL object, nor a URL-handling library, specifically. Your bug was intentionally trying to write "http://foo.com" + "blah", which URI happily (and incorrectly) allowed you to do. (2+2==42 → if you want this to be true, it's not a bug in Python's addition operator…)

eode commented

I most definitely would not want that to be true -- what I want is for there to be an implicit slash if an unrooted path is attached to a host, or better yet, for the library to reject unrooted paths in that case. Which it does now, so everything's roly-poly.