LazyString's `__html__` is an incomplete implementation of MarkupSafe
jace opened this issue · 0 comments
MarkupSafe's Markup class, as used in Flask, is very careful about mixing escaped and unescaped strings. For instance, Markup(...).format(...)
will ensure all format variables are escaped before being interpolated into the string, and the resulting string is fully escaped.
LazyString does not do this, and the presence of a __html__
method (as previously raised in #121) creates a situation where there is no way to tell whether a string is properly escaped or not:
>>> from flask_babel import lazy_gettext
>>> from markupsafe import Markup
>>> l = lazy_gettext("This is a <em>string</em> with a {var}")
>>> l
l'This is a <em>string</em> with a {var}'
>>> Markup(l)
Markup('This is a <em>string</em> with a {var}')
>>> Markup(l.format(var="variable & more"))
Markup('This is a <em>string</em> with a variable & more')
>>> Markup(l).format(var="variable & more")
Markup('This is a <em>string</em> with a variable & more')
When a lazy string has format variables, it must be wrapped in Markup()
before calling .format()
to make it continue to behave as a HTML string. However, this is dangerous to do in a function that receives the string as parameter. Markup wrapping must happen at source, but that is also not possible in a lazy context as it causes a string evaluation.
Here is a test case showing how gettext
, lazy_gettext
and Markup
all behave differently. As a result, neither translator nor programmer has any indication on whether any given string is plain text or HTML, and every string will need a full integration test to confirm markup and escaping are handled appropriately across translations.
Possible mitigations:
-
LazyString needs two implementations with and without
__html__
, for useby the Jinja extension andin HTML and non-HTML contexts respectively. This solves for the programmer but not the translator, who still gets no context on whether it is safe to use common punctuation characters likelazy_gettext
functions&
,<
and>
. This may be solved by separating HTML and non-HTML strings into different domains. -
LazyString must derive from Markup, or reproduce its functionality to ensure appropriate escaping in all scenarios, and
gettext
must wrap strings in Markup before returning. This solves for both parties in a consistent way, but commits to all i18n strings being HTML, requiring unescaping before use in non-HTML contexts (JSON APIs, Markdown, etc).
from flask import Flask, Markup
from flask_babel import Babel, gettext, lazy_gettext
import pytest
@pytest.fixture(scope='session')
def app():
return Flask(__name__)
@pytest.fixture(scope='session')
def babel(app):
return Babel(app)
@pytest.fixture()
def ctx(app, babel):
with app.test_request_context() as context:
yield context
raw_string = "This is a <em>string</em> with a {var}"
get_texts = [
pytest.param(lambda: gettext(raw_string), id='str'),
pytest.param(lambda: lazy_gettext(raw_string), id='lazy'),
pytest.param(lambda: Markup(raw_string), id='markup'),
]
@pytest.mark.usefixtures('ctx')
@pytest.mark.parametrize('get_text', get_texts)
def test_gettext_type(get_text):
text = get_text().format(var="variable & more")
assert isinstance(text, str)
@pytest.mark.usefixtures('ctx')
@pytest.mark.parametrize('get_text', get_texts)
def test_gettext_value(get_text):
text = get_text().format(var="variable & more")
assert text == "This is a <em>string</em> with a variable & more"
@pytest.mark.usefixtures('ctx')
@pytest.mark.parametrize('get_text', get_texts)
def test_gettext_html(get_text):
text = get_text().format(var="variable & more")
assert '__html__' in text
Output (with errors interpolated):
FAILED lazystr_test.py::test_gettext_value[str] - AssertionError: assert equals failed
E 'This is a <em>string</em> with a variable & more' 'This is a <em>string</em> with a variable & more'
FAILED lazystr_test.py::test_gettext_value[lazy] - AssertionError: assert equals failed
E 'This is a <em>string</em> with a variable & more' 'This is a <em>string</em> with a variable & more'
FAILED lazystr_test.py::test_gettext_html[str] - AssertionError: assert '__html__' in 'This is a <em>string</em> with a variable & more'
E AssertionError: assert '__html__' in 'This is a <em>string</em> with a variable & more'
FAILED lazystr_test.py::test_gettext_html[lazy] - AssertionError: assert '__html__' in 'This is a <em>string</em> with a variable & more'
E AssertionError: assert '__html__' in 'This is a <em>string</em> with a variable & more'
FAILED lazystr_test.py::test_gettext_html[markup] - AssertionError: assert '__html__' in Markup('This is a <em>string</em> with a variable & more')
E AssertionError: assert '__html__' in Markup('This is a <em>string</em> with a variable & more')
=== 5 failed, 4 passed ===