formatOutput
bytestream opened this issue · 4 comments
Am I missing something or does formatOutput
not work in combination with target_document
?
$dom = new DOMDocument;
$dom->preserveWhiteSpace = false;
$dom->formatOutput = true;
$html5 = new HTML5(['disable_html_ns' => true, 'target_document' => $dom]);
$dom = @ $html5->loadHTML($str);
echo $html5->saveHTML($dom);
Compare with DOMDocument and it adds new lines: https://3v4l.org/64gGA
@goetas any ideas?
Do not understand what is the expected output... can you please give a way to reproduce the issue with code, current output and expected output?
<?php
// Register auto loader.
require __DIR__.'/vendor/autoload.php';
// Poorly formatted HTML.
$string = '<html><body><div dir="ltr">hi m,<div><br></div><div>this is a reply to your query</div><div>please treat it carefully</div><div><br></div><div>...!</div><div><br></div><div>{% note x, %\}</div><div><br></div><div>look into this for me! :D</div><div>%}</div></div></body></html>';
$string = mb_convert_encoding($string, 'HTML-ENTITIES', 'UTF-8');
// Initialise DOMDocument instance.
$dom = new \DOMDocument;
$dom->preserveWhiteSpace = false;
$dom->formatOutput = true;
// Initialise HTML5 library with target_document so it uses formatOutput...
$html5 = new \Masterminds\HTML5(['disable_html_ns' => true, 'target_document' => $dom]);
$html5Dom = @ $html5->loadHTML($string);
// Get HTML5 library output.
$html5Html = $html5->saveHTML($html5Dom);
// Get PHP DomDocument output.
$dom->loadHTML($string);
$domHtml = $dom->saveHTML();
// Compare difference.
file_put_contents(__DIR__.'/1_html5.txt', $html5Html);
file_put_contents(__DIR__.'/1_domdocument.txt', $domHtml);
Diff the two files. 1_domdocument.txt
has new lines, 1_html5.txt
is all on the same line.
This library does not implement any kind of formatted output.
Your example has an issue.
When getting the $domHtml, via
$domHtml = $dom->saveHTML();`, you are using the XML-DOM output formatter.
// Get PHP DomDocument output.
$dom->loadHTML($string);
$domHtml = $dom->saveHTML();
The XML-DOM formatter is different from the one provided by this library.
A more correct example should have been:
// Register auto loader.
require __DIR__.'/vendor/autoload.php';
// Poorly formatted HTML.
$string = '<html><body><div dir="ltr">hi m,<div><br></div><div>this is a reply to your query</div><div>please treat it carefully</div><div><br></div><div>...!</div><div><br></div><div>{% note x, %\}</div><div><br></div><div>look into this for me! :D</div><div>%}</div></div></body></html>';
$string = mb_convert_encoding($string, 'HTML-ENTITIES', 'UTF-8');
// Initialise DOMDocument instance.
$dom = new \DOMDocument;
$dom->preserveWhiteSpace = false;
$dom->formatOutput = true;
// Initialise HTML5 library with target_document so it uses formatOutput...
$html5 = new \Masterminds\HTML5(['disable_html_ns' => true, 'target_document' => $dom]);
$html5Dom = @ $html5->loadHTML($string);
// Get HTML5 library output.
$html5Html = $html5->saveHTML($html5Dom);
// Get PHP DomDocument output.
$dom->loadHTML($string);
$domHtml = $html5->saveHTML($dom);
// Compare difference.
file_put_contents(__DIR__.'/1_html5.txt', $html5Html);
file_put_contents(__DIR__.'/1_domdocument.txt', $domHtml);
In that case you will see that the output is the same since both outputs are generated by the $html5
object.
To summarize, your code comment //Initialise HTML5 library with target_document so it uses formatOutput...
implies something that is not implemented in the current library.
Auto-formatting is something that could make sense, but currently is not implemented.
Consider that the following HTML5 snippet, when formatted will be differently rendered, thus formatting is a dangerous operation.
<div>He<b>ll</b>o!</div>
Rendered as:
Hello!
When formatted, will be:
<div>
He
<b>ll</b>
o!
</div>
He ll o!