DiDOM - simple and fast HTML parser.
- Installation
- Quick start
- Creating new document
- Search for elements
- Verify if element exists
- Output
- Creating a new element
- Working with element attributes
- Working with cache
- Comparison with other parsers
To install DiDOM run the command:
composer require imangazaliev/didom
use DiDom\Document;
$document = new Document('http://www.news.com/', true);
$posts = $document->find('.post');
foreach($posts as $post) {
echo $post->text(), "\n";
}
DiDom allows to load HTML in several ways:
// the first parameter is a string with HTML
$document = new Document($html);
// file path
$document = new Document('page.html', true);
// or URL
$document = new Document('http://www.example.com/', true);
The second parameter specifies if you need to load file. Default is false
.
$document = new Document();
$document->loadHtml($html);
$document->loadHtmlFile('page.html');
$document->loadHtmlFile('http://www.example.com/');
DiDOM accepts CSS selector or XPath as an expression for search. You need to path expression as the first parameter, and specify its type in the second one (default type is Query::TYPE_CSS
):
use DiDom\Document;
use DiDom\Query;
...
// CSS selector
$posts = $document->find('.post');
// XPath
$posts = $document->find("//div[contains(@class, 'post')]", Query::TYPE_XPATH);
$posts = $document('.post');
$posts = $document->xpath("//*[contains(concat(' ', normalize-space(@class), ' '), ' post ')]");
You can do search inside an element:
echo $document->find('.post')[0]->find('h2')[0]->text();
If the elements that match a given expression are found, it returns an array of instances of DiDom\Element
, otherwise - an empty array.
To very if element exist use has()
method:
if ($document->has('.post')) {
// code
}
If you need to check if element exist and then get it:
if ($document->has('.post')) {
$elements = $document->find('.post');
// code
}
but it would be faster like this:
if (count($elements = $document->find('.post')) != 0) {
// code
}
because in the first case it makes two requests.
$posts = $document->find('.post');
echo $posts[0]->html();
$html = (string) $posts[0];
$posts = $document->find('.post');
echo $posts[0]->text();
use DiDom\Element;
$element = new Element('span', 'Hello');
// Outputs "<span>Hello</span>"
echo $element->html();
First parameter is a name of an attribute, the second one is its value (optional).
$name = $element->tag;
$element->setAttribute('name', 'username');
$element->attr('name', 'username');
$element->name = 'username';
$username = $element->getAttribute('value');
$username = $element->attr('value');
$element->name = 'username';
Returns null
if attribute is not found.
if ($element->hasAttribute('name')) {
// code
}
if (isset($element->name)) {
// code
}
$element->removeAttribute('name');
unset($element->name);
Cache is an array of XPath expressions, that were converted from CSS.
use DiDom\Query;
...
$xpath = Query::compile('h2');
$compiled = Query::getCompiled();
// array('h2' => '//h2')
var_dump($compiled);
Query::setCompiled(['h2' => '//h2']);