/gumbo-php

Low-level PHP extension for HTML5

Primary LanguagePHPApache License 2.0Apache-2.0

Gumbo PHP

Gumbo PHP is low-level extension for HTML5 parsing.

Software License Build Status PHP 7 ready

Gumbo PHP builds DOMDocument using Gumbo HTML5 Parser. This solution solves all problems with HTML5 parsing or pages with inline JavaScript.

use Layershifter\Gumbo\Parser;

$document = Parser::load('<a>Apples and bananas.</a>');
var_dump($document->saveHTML());

string(33) "<a>Apples and bananas.</a>
"

Requirements

The following versions of PHP are supported.

  • PHP 5.6
  • PHP 7.0

Install

To build gumbo-php extenstion PHP-devel package is required. The package should contain phpize utility.

$ git clone https://github.com/layershifter/gumbo-php.git
$ cd gumbo-php
$ phpize
$ ./configure
$ make
$ make install

This will build a 'gumbo.so' shared extension, load it in php.ini using:

[gumbo]
extension = gumbo.so

Known issues

  • double encoding of entities (#6)
$doc = \Layershifter\Gumbo\Parser::load('<h1>Hello&nbsp;world!</h1>');
var_dump($doc->saveHTML());

string "<h1>Hello&amp;nbsp;world!</h1>"

Testing

$ composer install
$ composer test

Sponsors

SORGE
SORGE - website tracking tool

License

This library is released under the Apache 2.0 license. Please see License File for more information.