python-hyper/hyperlink

Rooted flag still causing serialization oddities

cdunklau opened this issue · 3 comments

I thought this was the same bug as #90, but @glyph's #91 didn't fix it, so I guess it's at least slightly distinct.

_________ TestURL.test_reproduce_my_rooted_oddity _________

self = <hyperlink.test.test_url.TestURL testMethod=test_reproduce_my_rooted_oddity>

    def test_reproduce_my_rooted_oddity(self):
        a = URL(scheme='udp', port=4900)
        b = URL.from_text('udp://:4900')
        assert str(a) == str(b)
        assert a.asText() == b.asText()
>       assert a == b
E       AssertionError: assert URL.from_text('udp://:4900') == URL.from_text('udp://:4900')
E         -URL.from_text('udp://:4900')
E         +URL.from_text('udp://:4900')

src/hyperlink/test/test_url.py:1101: AssertionError

That assertion error is entertainingly befuddling 😃

After a cursory inspection of a and b there, the only differences I found in the instance __dict__ were _rooted and _uses_netloc.

The same failure happens with URL(scheme='udp', host='', port=4900), but doing URL(scheme='udp', port=4900, rooted=True) (with or without host) makes the test pass.

(this was tested on 688233a)

glyph commented

This should probably be addressed the same way that the rooted-normalization happens with paths; if you have an empty hostname but a scheme like this, then you're implicitly rooted. (Is this kind of URL technically valid according to the spec?)

Is this kind of URL technically valid according to the spec?

Looks like it (selected parts of https://tools.ietf.org/html/rfc3986#appendix-A):

   URI           = scheme ":" hier-part [ "?" query ] [ "#" fragment ]
   hier-part     = "//" authority path-abempty
                 / path-absolute
                 / path-rootless
                 / path-empty

   authority     = [ userinfo "@" ] host [ ":" port ]
   host          = IP-literal / IPv4address / reg-name

   reg-name      = *( unreserved / pct-encoded / sub-delims )

   path-abempty  = *( "/" segment )

Zero-length strings are allowed by both path-abempty (so no / after port is valid) and reg-name (so no "host" is valid), so by my interpretation udp://:4900 is allowed.

But it's not as clear to me what such a form implies in terms of hyperlink's rooted concept.

Yeah. rooted seems specific to URLs with paths. I think the existence of this attribute on URL is unfortunate.