python-hyper/hyperlink

URLs don't support fromText -> toURI with URLs containing IPv6 literals

hawkowl opened this issue · 2 comments

>>> URL.fromText(u"http://[3fff::1]/foo").asURI().asText()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/hawkowl/venvs/commands/lib/python2.7/site-packages/hyperlink/_url.py", line 1338, in to_uri
    new_host = self.host if not self.host else idna_encode(self.host, uts46=True).decode("ascii")
  File "/home/hawkowl/venvs/commands/lib/python2.7/site-packages/idna/core.py", line 340, in encode
    s = uts46_remap(s, std3_rules, transitional)
  File "/home/hawkowl/venvs/commands/lib/python2.7/site-packages/idna/core.py", line 332, in uts46_remap
    _unot(code_point), pos + 1, repr(domain)))
idna.core.InvalidCodepoint: Codepoint U+003A not allowed at position 5 in u'3fff::1'

Hey Hawkie! This was pretty concerning at first, since I thought we had a bunch of ipv6 coverage, but now I see, so the problem is actually the to_uri() part and the newly-integrated idna stuff:

>>> url = URL.from_text(u'https://[2001:0db8:85a3:0000:0000:8a2e:0370:7334]:80/')
>>> url.to_uri()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/mahmoud/virtualenvs/tmp-d364b3d6b21cd4e4/local/lib/python2.7/site-packages/hyperlink/_url.py", line 1338, in to_uri
    new_host = self.host if not self.host else idna_encode(self.host, uts46=True).decode("ascii")
  File "/home/mahmoud/virtualenvs/tmp-d364b3d6b21cd4e4/local/lib/python2.7/site-packages/idna/core.py", line 358, in encode
    s = alabel(label)
  File "/home/mahmoud/virtualenvs/tmp-d364b3d6b21cd4e4/local/lib/python2.7/site-packages/idna/core.py", line 270, in alabel
    ulabel(label)
  File "/home/mahmoud/virtualenvs/tmp-d364b3d6b21cd4e4/local/lib/python2.7/site-packages/idna/core.py", line 304, in ulabel
    check_label(label)
  File "/home/mahmoud/virtualenvs/tmp-d364b3d6b21cd4e4/local/lib/python2.7/site-packages/idna/core.py", line 261, in check_label
    raise InvalidCodepoint('Codepoint {0} at position {1} of {2} not allowed'.format(_unot(cp_value), pos+1, repr(label)))
idna.core.InvalidCodepoint: Codepoint U+003A at position 5 of u'2001:0db8:85a3:0000:0000:8a2e:0370:7334' not allowed

So I'm guessing we just need to skip idna-encoding of IP-literal stuff, since it's pretty much guaranteed to be ASCII (some examples). How's that sound?