Incorrect processing of <script type="text/html">
roadster31 opened this issue · 3 comments
Hello,
I often use the construct <script id="some-id" type="text/html"> some HTML code </script> to inject HTML code in the DOM. The HTML code between <script> and </script> is incorrectly processed by HtmlMin.
What is this feature about (expected vs actual behaviour)?
Source code :
<!doctype html>
<html lang="fr">
<head>
<title>Test</title>
</head>
<body>
A Body
<script id="elements-image-1" type="text/html">
<div class="place badge-carte">Place du Village<br>250m - 2mn à pied</div>
<div class="telecabine badge-carte">Télécabine du Chamois<br>250m - 2mn à pied</div>
<div class="situation badge-carte"><img src="https://domain.tld/assets/frontOffice/kneiss/template-assets/assets/dist/img/08ecd8a.png" alt=""></div>
</script>
</body>
</html>
Expected behaviour :
<!DOCTYPE html><html lang="fr"><head><title>Test</title></head><body>A Body<script id="elements-image-1" type="text/html">
<div class="place badge-carte">Place du Village<br>250m - 2mn à pied</div>
<div class="telecabine badge-carte">Télécabine du Chamois<br>250m - 2mn à pied</div>
<div class="situation badge-carte"><img src="https://domain.tld/assets/frontOffice/kneiss/template-assets/assets/dist/img/08ecd8a.png" alt=""></div>
</script></body></html>
Actual behaviour :
<!DOCTYPE html><html lang="fr"><head><title>Test</title></head><body>A Body<script id="elements-image-1" type="text/html">
<div class="place badge-carte">Place du Village<br>250m - 2mn à pied
<div class="telecabine badge-carte">Télécabine du Chamois<br>250m - 2mn à pied
<div class="situation badge-carte"><img src="https://domain.tld/assets/frontOffice/kneiss/template-assets/assets/dist/img/08ecd8a.png" alt="">
</script></body></html>
How can I reproduce it?
Use the above source code.
Does it take minutes, hours or days to fix?
Not sure about that. Maybe minutes to ignore <script type="text/html"> content ?
Any additional information?
Thanks for your work :)
After a few tests, it seems that DOMDocument::loadHTML() is the root cause of this problem. Loading the test document and saving it immediately gives the following result, where </div>
are missing :
<!DOCTYPE html>
<?xml encoding="UTF-8" ?><html lang="fr"><head><title>Test</title></head><body>
A Body
<script id="elements-image-1" type="text/html">
<div class="place badge-carte">Place du Village<br>250m - 2mn à pied
<div class="telecabine badge-carte">Télécabine du Chamois<br>250m - 2mn à pied
<div class="situation badge-carte"><img src="https://domain.tld/assets/frontOffice/kneiss/template-assets/assets/dist/img/08ecd8a.png" alt="">
</script></body></html>
I'll investigate and get back to you if I find something interesting about that.
After digging in StackOverflow, it seems that the only possible solution is parsing the HTML as XML, after processing self-closing tags to provide a valid XML document to the XML loader :
fixed in version 3.1.3