#PHP DOM HTML Parser
PHP DOM HTML Parser uses built-in PHP DOM extension to process your requests, it is a good alternative to PHP Simple HTML DOM Parser
PHP DOM extension requires the libxml PHP extension. This means that passing in --enable-libxml is also required, although this is implicitly accomplished because libxml is enabled by default.
$html = '<div>
test this library! <a href="https://github.com/shinbonlin">PHP DOM HTML Parser</a>
</div><div class="example">test string!</div>';
if you use Namespace
$html_dom = new \HtmlParser\ParserDom($html);
// the second parameter set "true" to use Tidy for cleaning up HTML source.
// Need to enable PHP extension php_tidy.dll to use Tidy library, please check your php.ini or install php5-tidy.
$html_dom = new \HtmlParser\ParserDom($html, true);
If you comment out the Namespace in line:2
$html_dom = new ParserDom($html);
Find all images
foreach($html_dom->find('img') as $element) {
echo $element->src . '<br>';
echo $element->getAttr('src') . '<br>';
}
Find all links
foreach($html_dom->find('a') as $element) {
echo $element->href . '<br>';
echo $element->getAttr('href') . '<br>';
}
Find all anchors, returns a array of element objects
$ret = $html_dom->find('a');
Find (N)th anchor, returns element object or null if not found (zero based)
$ret = $html_dom->find('a', 0);
Find lastest anchor, returns element object or null if not found (zero based)
$ret = $html_dom->find('a', -1);
Find all <div>
with the id attribute
$ret = $html_dom->find('div[id]');
Find all <div>
which attribute id=foo
$ret = $html_dom->find('div[id=foo]');
Find all element which id=foo
$ret = $html_dom->find('#foo');
Find all element which class=foo
$ret = $html_dom->find('.foo');
Find all HTML tags with the id attribute
$ret = $html_dom->find('*[id]');
Find all anchors and images
$ret = $html_dom->find('a, img');
Find all anchors and images with the "title" attribute
$ret = $html_dom->find('a[title], img[title]');
Find all <li>
in <ul>
$es = $html_dom->find('ul li');
Find Nested
$es = $html_dom->find('div div div');
Find all <td>
in <table>
which class=hello
$es = $html_dom->find('table.hello td');
Find all td tags with attribite align=center in table tags
$es = $html_dom->find('table td[align=center]');
Modify class attribute
$html_dom->find('div', 1)->class = 'bar';
Modify inner text (HTML is allowed)
$html_dom->find('div[id=hello]', 0)->innertext = 'foo';
Modify outer HTML
// this example will remove (destory) `<a>` element and replace with new element `<h1>`
$html_dom->find('a', 0)->outertext = '<h1>Title Link</h1>';
Find all <li>
in <ul>
foreach($html_dom->find('ul') as $ul) {
foreach($ul->find('li') as $li) {
// do something...
}
}
Find first <li>
in first <ul>
$e = $html_dom->find('ul', 0)->find('li', 0);
Filter | Description |
---|---|
[attribute] | Matches elements that have the specified attribute. |
[!attribute] | Matches elements that don't have the specified attribute. |
[attribute=value] | Matches elements that have the specified attribute with a certain value. |
[attribute!=value] | Matches elements that don't have the specified attribute with a certain value. |
[attribute^=value] | Matches elements that have the specified attribute and it starts with a certain value. |
[attribute$=value] | Matches elements that have the specified attribute and it ends with a certain value. |
[attribute*=value] | Matches elements that have the specified attribute and it contains a certain value. |
// Example HTML: <div class="blue-color">foo <b>bar</b></div>
$e = $html_dom->find("div", 0);
echo $e->outertext; // Returns: "<div class="blue-color">foo <b>bar</b></div>"
echo $e->innertext; // Returns: "foo <b>bar</b>"
echo $e->plaintext; // Returns: "foo bar"
echo $e->tag; // Returns: "div" (current tag name)
echo $e->class; // Returns: "blue-color"
Attribute Name | Usage |
---|---|
$e->outertext | Read or write the outer HTML text of element. |
$e->innertext | Read or write the inner HTML text of element. |
$e->plaintext | Read or write the plain text of element. |
$e->tag | Read current tag name of element. |
$e->src | Read or write "src" attribute of element. |
$e->class | Read or write "class" attribute of element. |
$e->href | Read or write "href" attribute of element. |
Except outertext, innertext and plaintext, you can use $e->attribute_name to read or write attribute of element.
Get an attribute ( If the attribute is non-value attribute (eg. checked, selected...), it will returns true or false)
$value = $e->href;
Set an attribute(If the attribute is non-value attribute (eg. checked, selected...), set it's value as true or false)
$e->href = 'my link';
Remove an attribute, set it's value as null!
$e->href = null;
Determine whether an attribute exist?
if (isset($e->href)) echo 'href exist!';
Extract contents from HTML
echo $html->plaintext;
Wrap an element
$e->outertext = '<div class="wrap">' . $e->outertext . '<div>';
Remove an element, set it's outertext as an empty string
$e->outertext = '';
Append an element
$e->outertext = $e->outertext . '<div>foo<div>';
Insert an element
$e->outertext = '<div>foo<div>' . $e->outertext;
Dumps the internal DOM tree back into string
$str = $html_dom->save();
Dumps the internal DOM tree back into a file
$html_dom->save('result.htm');
Script will free memory automatically, however, if you would like to do it manually
$html_dom->clear();
You can use PHP DOM extension as the following code:
$html_dom->node
$html_dom->node->childNodes
$html_dom->node->parentNode
$html_dom->node->firstChild
$html_dom->node->lastChild
For more information, please visit http://php.net/manual/en/book.dom.php