Unable to parse `http://www.test.com/BMF%20Ver%F6ffentlichungen?`
damiencarol opened this issue · 3 comments
damiencarol commented
Seems the parse function generate an error for this URL: http://www.test.com/BMF%20Ver%F6ffentlichungen?
Logs:
>>> import hyperlink
>>> hyperlink.parse("http://www.test.com/BMF%20Ver%F6ffentlichungen?")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/damien/dd2/.venv/lib/python3.9/site-packages/hyperlink/_url.py", line 2447, in parse
dec_url = DecodedURL(enc_url, lazy=lazy)
File "/home/damien/dd2/.venv/lib/python3.9/site-packages/hyperlink/_url.py", line 2046, in __init__
self.host, self.userinfo, self.path, self.query, self.fragment
File "/home/damien/dd2/.venv/lib/python3.9/site-packages/hyperlink/_url.py", line 2177, in path
[
File "/home/damien/dd2/.venv/lib/python3.9/site-packages/hyperlink/_url.py", line 2178, in <listcomp>
_percent_decode(p, raise_subencoding_exc=True)
File "/home/damien/dd2/.venv/lib/python3.9/site-packages/hyperlink/_url.py", line 766, in _percent_decode
return unquoted_bytes.decode(subencoding)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf6 in position 7: invalid start byte
damiencarol commented
FYI @37b
mahmoud commented
Hi Damien! Hyperlink by default is reporting that the %F6 in your URL is invalid text when decoded from UTF-8. We can try adding the decoded=False parameter to get a result:
>>> hyperlink.parse('http://www.test.com/BMF%20Ver%F6ffentlichungen', decoded=False)
URL.from_text('http://www.test.com/BMF%20Ver%F6ffentlichungen')
This approach gives you a URL with mostly the same interface as a DecodedURL (the default output of parse), but be aware that you may run into issues when trying to treat parts of that URL as text vs bytes. Hope this helps!
damiencarol commented
@mahmoud thanks, we are investigating if we can use the decoded flag.