Unicode character escapes are encoded again by Encoder.forHtml()

Question

Unicode character escapes are encoded again by Encoder.forHtml()

indra2gurjar opened this issue 6 years ago · 2 comments

if the input string contains unicode escaped character e.g. ✅
the output is "& amp ;#9989;"
the '&' character is encoded again.
Does Encoder support unicode escaped characters and it is a bug or this is not supported?

Answer 1 · 2018-05-29T23:38:51.000Z

The encoder is meant, on purpose, to encode all dangerous characters like you are describing. This is not the right tool for you. If you have HTML entities that you wish to preserve then your input is HTML. Consider using the OWASP HTML Sanitizer instead. Aloha, -- Jim Manico @manicode

…

On May 28, 2018, at 4:50 AM, Indra Kumar Gurjar ***@***.***> wrote: if the input string contains unicode escaped character e.g. ✅ the output is ✅ the '&' character is encoded again. Does Encoder support unicode escaped characters and it is a bug or this is not supported? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

Answer 2 · 2018-08-15T20:23:18.000Z

This is not something we can fix, it's about proper use of the library.