pallets/markupsafe

Markup.striptags: comments now get replaced with a space

kholmanskikh opened this issue · 1 comments

In 2.1.4, if Markup.striptags is called, the comment gets replaced by a space. In versions before that it's completely removed from the output.

Test case to reproduce:

import unittest

from markupsafe import Markup

class MarkupSafeTest(unittest.TestCase):
    def test_striptags(self):
        value = 'x <!-- comment -->'
        self.assertEqual(Markup(value).striptags(), 'x')


if __name__ == '__main__':
    unittest.main()

With markupsafe 2.1.4 it fails, with 2.1.3 - it passes.

I'm running with Python 3.11

The issue was originally found by running the test_striptags jinja test case with markupsafe 2.1.4:

https://github.com/pallets/jinja/blob/3fd91e4d11bdd131d8c12805177dbe74d85e7b82/tests/test_filters.py#L94

I've just noticed this change, too. I think the issue is that the old version of Markup.striptags removed comments and tags, and then coalesced whitespace:

# Use two regexes to avoid ambiguous matches.
value = _strip_comments_re.sub("", self)
value = _strip_tags_re.sub("", value)
value = " ".join(value.split())

The new version has reversed this order and coalesces whitespace before stripping comments and tags:

def striptags(self) -> str:
""":meth:`unescape` the markup, remove tags, and normalize
whitespace to single spaces.
>>> Markup("Main &raquo;\t<em>About</em>").striptags()
'Main » About'
"""
# collapse spaces
value = " ".join(self.split())
# Look for comments then tags separately. Otherwise, a comment that
# contains a tag would end early, leaving some of the comment behind.
while True:
# keep finding comment start marks
start = value.find("<!--")

(I think the old order of operations is more correct.)

[eta: I've just created PR #418 that addresses this issue.]