Open bracket '<' still cleaned up without closing bracket
Closed this issue · 1 comments
alyohea commented
Describe the bug
Thanks for the fix provided #705!
I think I found regression after that fix
- Python Version: [e.g. 3.9.6]
- Bleach Version: [e.g. 6.1.0]
To Reproduce
Steps to reproduce the behavior:
# Working!
In [5]: bleach.clean("<test abc")
Out[5]: '<test abc'
# Doesn't work (because of duplicated words?)
In [6]: bleach.clean("<test abc abc")
Out[6]: ''
# However this work
In [12]: bleach.clean("<test abc abd")
Out[12]: '<test abc abd'
# Doesn't work (with space in the end)
In [7]: bleach.clean("<test abc ")
Out[7]: ''
# Doesn't work (with space in the end)
In [8]: bleach.clean("asd<test abc ")
Out[8]: 'asd'
# However this work
In [9]: bleach.clean("asd<test abc asd")
Out[9]: 'asd<test abc asd'
Expected behavior
# Doesn't work (because of duplicated words?)
In [6]: bleach.clean("<test abc abc")
Out[6]: '<test abc abc'
# Doesn't work (with space in the end)
In [7]: bleach.clean("<test abc ")
Out[7]: '<test abc '
# Doesn't work (with space in the end)
In [8]: bleach.clean("asd<test abc ")
Out[8]: 'asd<test abc '
Additional context
Add any other context about the problem here.
willkg commented
Thank you for putting so much effort into this bug report--I really appreciate it!
I think there are a couple of issues here:
- It looks like the duplicate token does affect things. It kicks up two parse errors and then everything goes sideways:
{'type': 7, 'data': 'eof-in-attribute-name'}
{'type': 7, 'data': 'duplicate-attribute'}
- It looks like we need to handle another parse error case:
{'type': 7, 'data': 'expected-end-of-tag-but-got-eof'}
We'll need to fix each issue separately. I'll see what I can do.