Parsing introduces new line breaks when outputting the html with multiple childs
mangei opened this issue · 3 comments
What is this feature about (expected vs actual behaviour)?
When parsing a file, I would like to have the original html of an element, so that I can search-replace a specific part of a document, without changing/updating the rest.
The issue is that $el->html
does not return the right string, if any of its childs has more than one child. It introduces additional linebreaks.
How can I reproduce it?
Script: (it shows my full use-case; the notable part is highlighted)
<?php
use voku\helper\HtmlDomParser;
require_once '../composer/autoload.php';
$fileContent = file_get_contents('./test.html');
$dom = HtmlDomParser::str_get_html($fileContent);
foreach($dom->find('.mydiv') as $myDivEl) {
$currentHtml = $myDivEl->html;
echo $currentHtml; // <---- here you can see the wrong output (you can skip the rest)
$newContent = "";
foreach($myDivEl->find('.mydiv-item') as $childEl) {
$childEl->class = 'replaced';
$newContent .= $childEl;
}
$myDivEl->outerhtml = '<div class="myreplacement">' . $newContent . '</div>';
$fileContent = str_replace($currentHtml, $myDivEl->html, $fileContent);
}
file_put_contents('./test-out.html', $fileContent);
Input HTML file:
<html>
<body>
<div class="mydiv">
</div>
<div class="mydiv">
<div class="mydiv-item"><span>A1</span></div>
</div>
<div class="mydiv">
<div class="mydiv-item"><span>B1</span><span>B2</span></div>
</div>
</body>
</html>
Actual output: (B is not replaced)
<html>
<body>
<div class="myreplacement"></div>
<div class="myreplacement"><div class="replaced"><span>A1</span></div></div>
<div class="mydiv">
<div class="mydiv-item"><span>B1</span><span>B2</span></div>
</div>
</body>
</html>
Expected output:
<html>
<body>
<div class="myreplacement"></div>
<div class="myreplacement"><div class="replaced"><span>A1</span></div></div>
<div class="myreplacement"><div class="replaced"><span>B2</span><span>B2</span></div></div>
</body>
</html>
The issue is, that the html of the selected elements is not the same, if an element has more than one child. Therefore the search-replace does not work correctly:
<div class="mydiv">
</div>
A:
<div class="mydiv">
<div class="mydiv-item"><span>A1</span></div>
</div>
B:
<div class="mydiv">
<div class="mydiv-item">
<span>B1</span><span>B2</span>
</div>
</div>
B should be:
<div class="mydiv">
<div class="mydiv-item"><span>B1</span><span>B2</span></div>
</div>
Does it take minutes, hours or days to fix?
Minutes?
Any additional information?
.
Thanks for your help!
It would also help me, if I can get the original parsed text, so that I can (search &) replace it. Maybe indices (from-to) of the original parsed string.
It's much more simple to use the HtmlDom object instead of some string replacements, here is an example: 7571bee
This is a lack of this library. If I have multiple parent & multiple child selectors that's a big problem.
Example:
<html>
<body>
<div class="mydiv">
</div>
<div class="mydiv_a">
<div class="mydiv-item"><span>A1</span></div>
</div>
<div class="mydiv_b">
<div class="mydiv-item"><span>B1</span><span>B2</span></div>
</div>
<div class="mydiv_c">
<div class="mydiv-item-next"><span>B1</span><span>B2</span></div>
</div>
</body>
</html>