bug: `linkify` with `parse_email=True` doesn't handle "%" a "?" in `addr-specs`
larseggert opened this issue · 4 comments
Describe the bug
bug: linkify
with parse_email=True
doesn't handle "%" and "?", which may occur in RFC822 addr-specs (see https://datatracker.ietf.org/doc/html/rfc2368#section-6)
- Python Version: 3.10.4
- Bleach Version: 5.0.0
To Reproduce
Steps to reproduce the behavior:
>>> bleach.linkify("gorby%kremvax@example.com", parse_email=True)
'<a href="mailto:gorby%kremvax@example.com">gorby%kremvax@example.com</a>'
Expected behavior
I expected RFC822 special characters to be percent-encoded according to RFC2368:
>>> bleach.linkify("gorby%kremvax@example.com", parse_email=True)
'<a href="mailto:gorby%25kremvax@example.com">gorby%kremvax@example.com</a>'
Additional context
Same issue exists with "?"; I didn't test other RFC822 special characters but suspect they are similarly left unquoted.
Thank you for the bug report! I'd appreciate a pull request from anyone who wants to tackle this. I don't think I'm going to get to it.
I tried to wrap a urllib.parse.quote()
around the the match.group(0)
bit in
Line 304 in 481b146
but that seems to have no effect.
I have noticed similar problem with clean()
function. Maybe it has the same root cause.
Example:
In [1]: import bleach
In [2]: bleach.clean("<a href='https://example.org?a=1&b=2'>example</a>")
Out[2]: '<a href="https://example.org?a=1&b=2">example</a>'
Notice that &
is changed to &
.