python-hyper/hyperlink

Unable to parse `http://www.test.com/BMF%20Ver%F6ffentlichungen?`

damiencarol opened this issue · 3 comments

Seems the parse function generate an error for this URL: http://www.test.com/BMF%20Ver%F6ffentlichungen?

Logs:

>>> import hyperlink
>>> hyperlink.parse("http://www.test.com/BMF%20Ver%F6ffentlichungen?")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/damien/dd2/.venv/lib/python3.9/site-packages/hyperlink/_url.py", line 2447, in parse
    dec_url = DecodedURL(enc_url, lazy=lazy)
  File "/home/damien/dd2/.venv/lib/python3.9/site-packages/hyperlink/_url.py", line 2046, in __init__
    self.host, self.userinfo, self.path, self.query, self.fragment
  File "/home/damien/dd2/.venv/lib/python3.9/site-packages/hyperlink/_url.py", line 2177, in path
    [
  File "/home/damien/dd2/.venv/lib/python3.9/site-packages/hyperlink/_url.py", line 2178, in <listcomp>
    _percent_decode(p, raise_subencoding_exc=True)
  File "/home/damien/dd2/.venv/lib/python3.9/site-packages/hyperlink/_url.py", line 766, in _percent_decode
    return unquoted_bytes.decode(subencoding)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf6 in position 7: invalid start byte

Hi Damien! Hyperlink by default is reporting that the %F6 in your URL is invalid text when decoded from UTF-8. We can try adding the decoded=False parameter to get a result:

>>> hyperlink.parse('http://www.test.com/BMF%20Ver%F6ffentlichungen', decoded=False)
URL.from_text('http://www.test.com/BMF%20Ver%F6ffentlichungen')

This approach gives you a URL with mostly the same interface as a DecodedURL (the default output of parse), but be aware that you may run into issues when trying to treat parts of that URL as text vs bytes. Hope this helps!

@mahmoud thanks, we are investigating if we can use the decoded flag.