xmlet/HtmlFlow

Safe output?

esiqveland opened this issue ยท 13 comments

It seems that even when I use the Dynamic View:

  .dynamic(el -> el.text(model.title))

The output is not safe for viewing?
My example is that a user has entered some XSS in the title, such as: <script>alert('1')</script>.

Am I using the library wrong?

The purpose of dynamic is usually related to data binding. All HTML inside the lambda passed to dynamic (e.g. el.text(model.title)) is not considered static and thus, the resulting HTML may change from render to render. This is the only goal of dynamic. Nothing related with safety.

We only considered safety scenarios in HtmlFlow related with type safety and thread safety.

We did not deal with any scenario related with XSS safety. But maybe our documentation induced you in error. Tell me if is that the case.

Nevertheless, if you argue that HtmlFlow should include any kind of guard for XSS safety we may discuss it and further include it. In my opinion, maybe I am wrong, but I think that kind of control should be taken by the framework or developer using HtmlFLow, and not by HtmlFlow itself.

Thanks for your feedback

@esiqveland following my previous comment, I would say that in your example you should rather use:

.dynamic(el -> el.text(HtmlUtils.htmlEscape(model.title)))

Or any other escape utility function. In this case I am using HtmlUtils from Spring.

Hi, thank you for quick reply.

I would expect a text rendering engine to deal with unsafe user input natively, but that just be my assumptions playing into it.

The call to .dynamic(...) is the last (and only) place where we know we might be dealing with user input. After render it is too late. I would argue the default should be to escape input values so that if someone changes the post title to </div></body> it doesn't break your markup. If you don't escape input content, there is no guarantee anymore that the lib generated the markup expressed in the code.

Kotlinx.html as a similar behavior behavior to that pointed here by @esiqveland. In Kotlinx.html text is always safe unless programmer explicit use unsafe call such as:

head {
    style {
        unsafe { 
            raw("some text...")
        }
    }
}

We should enhance HtmlFlow API with something like this.

@lcduarte What do you think about making .text(String text) of HtmlApi always escape the content text? It seems to be the default behavior of most engines including Kotlinx.html. Yet, I am worried about the overhead of that escape processing. I noticed that in spring-template-benchmark the kotlinx view includes a unsafe { raw(it.summary) } rather than text(it.summary) to avoid that overhead here: https://github.com/jreijn/spring-comparing-template-engines/blob/master/src/main/java/com/jeroenreijn/examples/view/KotlinxHtmlIndexView.kt#L45

Thus, if we make escape the default behavior of .text() than we should also include an alternative .raw to avoid that overhead.

Additionally we can add a configuration property to set the default behavior of .text() to escape, or not. We could make the visitText() pass the text.getValue() through a escaping function that could be initialized on class loading with a method reference to escapeHtml() or to an effectless identity function.

public final <R> void visitText(Text<? extends Element, R> text) {

It makes sense. We change the default behavior to improve security and keep an option for users which rather have the performance improvement than the security added by content escaping.

The main idea is to replace write(text.getValue()) by write(escapeHtml.apply(text.getValue()))

And an additional static field: static final Function<String, String> escapeHtml;

And then initialize it in the static constructor depending of a system property, such as -Dhtmlflow.ignoreEscaping=true

static {
	// Read System properties
        if(... escaped ...)
		escapeHtml = HtmlUtils::htmlEscape;
        else 
                escapeHtml = html -> html;
}

Also I tend to not add any dependency for an auxiliary library and copy paste the code instead.

mhw commented

Hi; have you had any further thoughts on implementing this feature? The approach you outlined looks like it would solve the problem in a clean way, but it would also be handy to have a method that allowed raw content to be passed through unescaped.

Yes, I gave low priority to this feature. While it seems not to hard to implement, I did not have time yet.

Moreover, since then we have made a deep refactor in all HtmlFlow internals that implied a new API 4.0 to support asynchronous data models . So, I didn't wanted to mix it before finishing the new API.

I will try to address this feature to the next month.

Any updates on when this will be available in HtmlFlow? I don't feel comfortable using this library until it is implemented. It is too easy to forget to html escape text that should be.

I will release it next week in 4.4

DONE release 4.4

@mhw It is DONE on release 4.4.