wilsonzlin/minify-html

Some HTML entities are incorrectly transformed to UTF8 symbols (e.g. in URLs)

samupl opened this issue · 2 comments

samupl commented

When working on something I noticed (in a django app) that some URLs were rendered incorrectly.

The url in question had a query param called copy_origin. When the query param was not first (e.g. rendered as &copy_origin=something then it got transformed to the © symbol. This doesn't happen if the param is just called copy, the following underscore seems to make minify-html think it's a valid entity.

I found a few more examples.

This issue is happening at least since 0.11 up until the latest version 0.15:

echo '<a href="/example?attribute=something&copy_something=1&reg_something=1&euro_something=1&yen_something=1">test</a>' | ./minhtml-0.15.0-x86_64-unknown-linux-gnu
<a href=/example?attribute=something©_something=1®_something=1&euro_something=1¥_something=1>test</a>%       
samupl commented

@wilsonzlin Could you verify if this is a bug, or perhaps if it's not just me making incorrect assumptions about the minification?

Hello there, I am leaving this link here: https://denevcloud.azureedge.net/gumeristore/assets/js/minipopup-open.js to try out, this is not correctly minified and the return vector cannot be decoded. container UTF8 chars. Try it yourself guys.