/DiDOM

Simple and fast HTML parser

Primary LanguagePHPMIT LicenseMIT

DiDOM

Build Status Total Downloads Latest Stable Version License

Russian version

DiDOM - simple and fast HTML parser.

Contents

Installation

To install DiDOM run the command:

composer require imangazaliev/didom

Quick start

use DiDom\Document;

$document = new Document('http://www.news.com/', true);

$posts = $document->find('.post');

foreach($posts as $post) {
    echo $post->text(), "\n";
}

Creating new document

DiDom allows to load HTML in several ways:

With constructor
// the first parameter is a string with HTML
$document = new Document($html);
    
// file path
$document = new Document('page.html', true);

// or URL
$document = new Document('http://www.example.com/', true);

The second parameter specifies if you need to load file. Default is false.

With separate methods
$document = new Document();
    
$document->loadHtml($html);
    
$document->loadHtmlFile('page.html');

$document->loadHtmlFile('http://www.example.com/');

Search for elements

DiDOM accepts CSS selector or XPath as an expression for search. You need to path expression as the first parameter, and specify its type in the second one (default type is Query::TYPE_CSS):

With method find():
use DiDom\Document;
use DiDom\Query;
    
...

// CSS selector
$posts = $document->find('.post');

// XPath
$posts = $document->find("//div[contains(@class, 'post')]", Query::TYPE_XPATH);
With magic method __invoke():
$posts = $document('.post');
With method xpath():
$posts = $document->xpath("//*[contains(concat(' ', normalize-space(@class), ' '), ' post ')]");

You can do search inside an element:

echo $document->find('.post')[0]->find('h2')[0]->text();

If the elements that match a given expression are found, it returns an array of instances of DiDom\Element, otherwise - an empty array.

Verify if element exists

To very if element exist use has() method:

if ($document->has('.post')) {
    // code
}

If you need to check if element exist and then get it:

if ($document->has('.post')) {
    $elements = $document->find('.post');
    // code
}

but it would be faster like this:

if (count($elements = $document->find('.post')) != 0) {
    // code
}

because in the first case it makes two requests.

Output

Getting HTML

With method html():
$posts = $document->find('.post');

echo $posts[0]->html();
Casting to string:
$html = (string) $posts[0];

Getting content

$posts = $document->find('.post');

echo $posts[0]->text();

Creating a new element

use DiDom\Element;

$element = new Element('span', 'Hello');
    
// Outputs "<span>Hello</span>"
echo $element->html();

First parameter is a name of an attribute, the second one is its value (optional).

Working with element attributes

Getting attribute name

$name = $element->tag;

Creating/updating an attribute

With method setAttribute:
$element->setAttribute('name', 'username');
With method attr:
$element->attr('name', 'username');
With magic method __set:
$element->name = 'username';

Getting value of an attribute

With method getAttribute:
$username = $element->getAttribute('value');
With method attr:
$username = $element->attr('value');
With magic method __get:
$element->name = 'username';

Returns null if attribute is not found.

Verify if attribute exists

With method hasAttribute:
if ($element->hasAttribute('name')) {
    // code
}
With magic method __isset:
if (isset($element->name)) {
    // code
}

Removing attribute:

With method removeAttribute:
$element->removeAttribute('name');
With magic method __unset:
unset($element->name);

Working with cache

Cache is an array of XPath expressions, that were converted from CSS.

Getting from cache

use DiDom\Query;
    
...

$xpath    = Query::compile('h2');
$compiled = Query::getCompiled();

// array('h2' => '//h2')
var_dump($compiled);

Installing cache

Query::setCompiled(['h2' => '//h2']);

Comparison with other parsers

Comparison with other parsers