Encoding malicious code instead of removing it

Question

Encoding malicious code instead of removing it

bmscodespace opened this issue a year ago · 4 comments

Hi,

is it possible to build a policy that, instead of removing problematic parts of a html string, just encodes those parts in such a way that they can do no harm when the string is used in a html-page?
So
<script>alert`1`</script>
would then be replaced by something like
<script>alert`1`</script>.

Thank you for any answer ;)

p.s. the idea behind my question is that I would like to use a policy that does not know if it deals with a string that will be used as inner html or as an "ordinary text field" with no html but where we could read a text about the "<script>" tag. If malicious code is removed by the sanitizer, then this could destroy "ordinary text". On the other hand, if I would use output encoding on my string I would loose text formatting in the case of inner html.

Answer 1 · 2024-01-23T09:20:59.000Z

Please provide an example where it is not working as expected.

Answer 2 · 2024-01-24T17:19:59.000Z

Hi @csware ,

suppose a string is imported into an application and suppose we can't know if it will be used as inner HTML, as f.e. formatted text, or as a data string.

Suppose first to secure the text before it gets displayed we always sanitize it. But if that text is given as f.e.

"A script tag begins with <script> and ends like </script>" ,

then with no appropriate policy, the string

"A script tag begins with"

might reach the view and text is missing we might want to be displayed.

On the other hand, if I just encode every string that is imported and one such string is a formatted text (with some p tags in it or some list tags, or some b tags etc.) which is used as inner HTML, then I loose the possibility of formatted text.

My question would be if it is possible to secure a text where it is not clear if it will be used as a data string or as inner HTML. I hope this makes it a little bit clearer ;)

Answer 3 · 2024-01-28T14:57:12.000Z

I suppose this could be achieved using a preprocessor.

However, the input is not correctly encoded. If <script> should be shown on the screen, then it needs to be properly encoded in the first place, if there are also HTML tags for formatting.

Answer 4 · 2024-03-24T12:19:57.000Z

Hi, @bmscodespace
Coming to your question wheter it is possible or not so Yes, it is possible to build a policy that encodes problematic parts of an HTML string instead of removing them. This approach is known as HTML encoding or output encoding, and it helps prevent XSS attacks by converting potentially harmful characters into their HTML entity equivalents.