bug: Linkify does link "M.Sc." in text
barsch opened this issue · 5 comments
Describe the bug
Linkify does link stuff which shouldn't be ...
python and bleach versions (please complete the following information):
- Python Version: 3.7.9
- Bleach Version: 4.1.0
To Reproduce
Steps to reproduce the behavior:
>>> import bleach
>>> value = "XXX has an M.Sc. in YYY"
>>> bleach.linkify(value)
'XXX has an <a href="http://M.Sc" rel="nofollow">M.Sc</a>. in YYY'
>>> bleach.__version__
'4.1.0'
Expected behavior
>>> import bleach
>>> value = "XXX has an M.Sc. in YYY"
>>> bleach.linkify(value)
'XXX has an M.Sc. in YYY'
Linkify does link stuff which shouldn't be ...
Debatable. It's obviously not possible to tell deterministically whether something is supposed to be a URL or not. For example, consider the sentences:
I like hot chocolate.It makes me happy.
Here it's impossible to tell whether there is a space missing between two sentences, or if intention is to create a link to an Italian domain that should be linkified.
so why does the following work than:
>>> bleach.linkify("yyy M.Sac. xxx")
'yyy M.Sac. xxx'
>>> bleach.linkify("yyy M.bla. xxx")
'yyy M.bla. xxx'
>>> bleach.linkify("yyy M.bl. xxx")
'yyy M.bl. xxx'
>>> bleach.linkify("yyy M.B. xxx")
'yyy M.B. xxx'
somehow inconsistent ...
anyway I found the documented way to prevent links to certain domains - so I will just forbid the linking of the .sc domain
Because .sac, .bla, etc. aren't top-level domains in the TLD list:
https://github.com/mozilla/bleach/blob/main/bleach/linkifier.py
I actually tried with a .shop domain before and it was not working - so I assumed its some regex - anyway thanks for the explanation