Masterminds/html5-php

formatOutput

bytestream opened this issue · 4 comments

Am I missing something or does formatOutput not work in combination with target_document?

$dom = new DOMDocument;
$dom->preserveWhiteSpace = false;
$dom->formatOutput = true;

$html5 = new HTML5(['disable_html_ns' => true, 'target_document' => $dom]);
$dom = @ $html5->loadHTML($str);

echo $html5->saveHTML($dom);

Compare with DOMDocument and it adds new lines: https://3v4l.org/64gGA

@goetas any ideas?

Do not understand what is the expected output... can you please give a way to reproduce the issue with code, current output and expected output?

@goetas

<?php

// Register auto loader.
require __DIR__.'/vendor/autoload.php';

// Poorly formatted HTML.
$string = '<html><body><div dir="ltr">hi m,<div><br></div><div>this is a reply to your query</div><div>please treat it carefully</div><div><br></div><div>...!</div><div><br></div><div>{% note x, %\}</div><div><br></div><div>look into this for me! :D</div><div>%}</div></div></body></html>';
$string = mb_convert_encoding($string, 'HTML-ENTITIES', 'UTF-8');

// Initialise DOMDocument instance.
$dom = new \DOMDocument;
$dom->preserveWhiteSpace = false;
$dom->formatOutput = true;

// Initialise HTML5 library with target_document so it uses formatOutput...
$html5 = new \Masterminds\HTML5(['disable_html_ns' => true, 'target_document' => $dom]);
$html5Dom = @ $html5->loadHTML($string);

// Get HTML5 library output.
$html5Html = $html5->saveHTML($html5Dom);

// Get PHP DomDocument output.
$dom->loadHTML($string);
$domHtml = $dom->saveHTML();

// Compare difference.
file_put_contents(__DIR__.'/1_html5.txt', $html5Html);
file_put_contents(__DIR__.'/1_domdocument.txt', $domHtml);

Diff the two files. 1_domdocument.txt has new lines, 1_html5.txt is all on the same line.

This library does not implement any kind of formatted output.

Your example has an issue.
When getting the $domHtml, via $domHtml = $dom->saveHTML();`, you are using the XML-DOM output formatter.

// Get PHP DomDocument output.
$dom->loadHTML($string);
$domHtml = $dom->saveHTML();

The XML-DOM formatter is different from the one provided by this library.

A more correct example should have been:

// Register auto loader.
require __DIR__.'/vendor/autoload.php';

// Poorly formatted HTML.
$string = '<html><body><div dir="ltr">hi m,<div><br></div><div>this is a reply to your query</div><div>please treat it carefully</div><div><br></div><div>...!</div><div><br></div><div>{% note x, %\}</div><div><br></div><div>look into this for me! :D</div><div>%}</div></div></body></html>';
$string = mb_convert_encoding($string, 'HTML-ENTITIES', 'UTF-8');

// Initialise DOMDocument instance.
$dom = new \DOMDocument;
$dom->preserveWhiteSpace = false;
$dom->formatOutput = true;

// Initialise HTML5 library with target_document so it uses formatOutput...
$html5 = new \Masterminds\HTML5(['disable_html_ns' => true, 'target_document' => $dom]);
$html5Dom = @ $html5->loadHTML($string);

// Get HTML5 library output.
$html5Html = $html5->saveHTML($html5Dom);

// Get PHP DomDocument output.
$dom->loadHTML($string);
$domHtml = $html5->saveHTML($dom);

// Compare difference.
file_put_contents(__DIR__.'/1_html5.txt', $html5Html);
file_put_contents(__DIR__.'/1_domdocument.txt', $domHtml);

In that case you will see that the output is the same since both outputs are generated by the $html5 object.

To summarize, your code comment //Initialise HTML5 library with target_document so it uses formatOutput... implies something that is not implemented in the current library.
Auto-formatting is something that could make sense, but currently is not implemented.

Consider that the following HTML5 snippet, when formatted will be differently rendered, thus formatting is a dangerous operation.

<div>He<b>ll</b>o!</div>

Rendered as:

Hello!

When formatted, will be:

<div>
   He
  <b>ll</b>
  o!
</div>

He ll o!