barseghyanartur/tld

process_url will raise exception in some situations

Closed this issue · 3 comments

This issue is found when processing subtitles

To reproduce:

>>> from tld.utils import process_url
>>> process_url(':{\\rgit}https://github.com', fail_silently=True, fix_protocol=True)

The following exception will be raised

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/homebrew/lib/python3.9/site-packages/tld/utils.py", line 313, in process_url
    parsed_url = urlsplit(url)
  File "/opt/homebrew/Cellar/python@3.9/3.9.9/Frameworks/Python.framework/Versions/3.9/lib/python3.9/urllib/parse.py", line 489, in urlsplit
    _checknetloc(netloc)
  File "/opt/homebrew/Cellar/python@3.9/3.9.9/Frameworks/Python.framework/Versions/3.9/lib/python3.9/urllib/parse.py", line 434, in _checknetloc
    raise ValueError("netloc '" + netloc + "' contains invalid " +
ValueError: netloc ':{\rgit}https:' contains invalid characters under NFKC normalization

Does #119 fix this?

Merged into master. Working on updating the GitHub CI. Will release one day soon.

Released in 0.12.7.