paquettg/php-html-parser

Retreving multiple(n) children while ignoring the last 2

moshoodfakorede opened this issue · 2 comments

I am trying to parse HTML content gotten from an e-mail template and extract only some parts of the body. The structure of the HTML is

<body>
    <p></p>
    <div>
           <div></div>
           <div></div>
             . . .
            <div></div>
           <div></div>
    </div>
</dody>

Each of these divs contains elements that looks like this(no selectors) and the last two contains some default texts and links which are not needed.

<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> 
                <span style="font-family:&quot;Open Sans&quot;, Arial, sans-serif;font-size:14px;text-align:justify;background-color:rgb(255, 255, 255);display:inline !important">
                    Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. 
                    It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged.
                </span>
                <br /> 
 </div> 

Any ideas how can i get the content of the first n nested divs while ignoring the last 2 nested divs?

Based on the html above and the Readme you could do this: (working example)

$dom = new Dom;
$dom->loadStr('<body><p></p><div><div>Lorem</div><div>Ipsum</div><div>Link 1</div><div>Link 2</div></div></dody>');

$contents = $dom->find('div')[0];
$elementCount = count($contents);
$limit = $elementCount - 2;

$counter = 0;
foreach($contents as $content){
    if($counter === $limit){
        // Stop iterating
        break;
    }

    $html = $content->innerHTML;
    // Or if you want the with the tags: 
    // $html = $content->outerHTML; 
    print($html . "\n");

    $counter++;
}

Please try to provide an example that you have tried when asking a question. I have used a foreach loop, a for loop would be less code but this resembles the Readme more.

@PyIter Much appreciated. Thank you! I get the idea now.