python-babel/flask-babel

Security risk with LazyString strings being treated as HTML safe by Jinja

DonnchaC opened this issue ยท 5 comments

Jinja treats all objects with a __html__() method as HTML safe and does not perform any automatic escaping on them. Commit a5f2297 added a __html__() method to the LazyString class to allow for LazyStrings to automatically be serialized by flask.json.JSONEncoder.

This __html__() method makes the content of all gettext strings in a template potentially unsafe. I am using Jinja and Flask-Babel with Transifex in a project. Recently I experienced an issue where a string translated by an external translator broke our HTML as the translated string was not HTML escaped.

This unexpected behavior is particularly risky when an application is relying on external and/or crowd-sourced translations. A translator could inject JavaScript in an translation and potentially compromise a service. Translations changes may be less well reviewed than direct code changes.

I'm not sure what is the best way to resolve this risk. Flask-Babel should certainly choose a safe default. Perhaps the Flask-Babel documentation could give an example of a custom flask.json.JSONEncoder subclass which explicitly handles LazyStrings? Alternatively the Flask-Babel extension could have an unsafe option which could be set explicitly during initialization?

Sorry, it looks like this behavior is intentional in Jinja. The Jinja Babel extension explicitly converts gettext result values to Markup objects which are treated as safe (https://github.com/pallets/jinja/blob/e605ff1a0bc304bb543f6867c0798e0480f31926/jinja2/ext.py#L142). IMO this is under-documented and there should at least be an option to allow gettext results to be treated as unsafe in Jinja.

I'll open an issue on the Jinja2 repo. Feel free to close this.

Here is the work-around that I am currently using:

from flask_babel import Babel, get_translations
from jinja2 import Markup

class HTMLSafeBabel(Babel):
    """
    A subclass of Flash-Babel which automatically HTML escapes gettext strings

    Jinja2 treats gettext() values as HTML safe by default. This function wraps
    the gettext functions to explictly HTML escape the translated strings before
    they are used in a template.
    """
    def init_app(self, app, *args, **kwargs):
        super(HTMLSafeBabel, self).init_app(app, *args, **kwargs)
        if self._configure_jinja:
            app.jinja_env.install_gettext_callables(
                lambda x: self.reescape_string(get_translations().ugettext(x)),
                lambda s, p, n: self.reescape_string(get_translations().ungettext(s, p, n)),
                newstyle=True
            )

    @staticmethod
    def reescape_string(escaped_string):
        """
        Decode HTML entities and re-encoded safely using the Jinja HTML escaper.

        Existing entities need to be decoded first to avoid double-encoding.
        """
        return Markup.escape(HTMLParser.HTMLParser().unescape(escaped_string))

Thanks for your workaround, I'm sure someone will find it useful.

All Jinja gettext extensions are |safe and it's both intentional and expected. If you're using user-provided strings, you sanitize them at the source just as you would when validating a form. Many people are using translations with HTML in them as things like emphasis can change based on language.

If you're using user-provided strings, you sanitize them at the source just as you would when validating a form.

This is a highly irresponsible philosophy for a rendering library to have. It is often preferable to sanitize untrusted values at time of use, not at time of entry, and many applications do so. It is not appropriate for a library to claim that an unknown input is safe unless the caller has declared it safe, or the library has done its own sanitization.

There are likely easily exploitable XSS vulnerabilities in countless applications as a result of this decision.

I'm sorry you feel that way suddjian (and apparently your company, Preset). At the time this issue was made (again, 4 years ago), all Jinja2 (which this repository is not) i18n extensions marked their content as safe to be compatible with Jinja2. As @DonnchaC mentioned, the behavior was intentional in Jinja, and discussion on changing that should occur in Jinja.

It is exceptionally common to include HTML in your translation strings. This capability cannot be removed, and is possible in all frameworks with translations. Django works identical to this, for example, as do the majority of wrappers around gettext. Rails now has the _html suffix. Any change would need to keep this in mind.

It is exceptionally uncommon to be passing untrusted user input into your translation strings. Your hyperbole of countless XSS vulnerabilities is unwarranted.

If you would like to offer a fix to this open source project (see Creating a Pull Request), most likely using Jinja's autoescaping which recently became a core feature, it would be happily accepted.