/php-htmldiff

A library for comparing two HTML files/snippets and highlighting the differences using simple HTML. Includes support for comparing complex lists and tables

Primary LanguagePHPGNU General Public License v2.0GPL-2.0

php-htmldiff

Scrutinizer Code Quality Build Status Code Coverage Packagist Average time to resolve an issue Percentage of issues still open

php-htmldiff is a library for comparing two HTML files/snippets and highlighting the differences using simple HTML.

This HTML Diff implementation was forked from rashid2538/php-htmldiff and has been modified with new features, bug fixes, and enhancements to the original code.

For more information on these modifications, read the differences from rashid2538/php-htmldiff or view the CHANGELOG.

Demo

https://php-htmldiff.caxy.com/

Installation

The recommended way to install php-htmldiff is through Composer. Require the caxy/php-htmldiff package by running following command:

composer require caxy/php-htmldiff

This will resolve the latest stable version.

Otherwise, install the library and setup the autoloader yourself.

Working with Symfony

If you are using Symfony, you can use the caxy/HtmlDiffBundle to make life easy!

Usage

use Caxy\HtmlDiff\HtmlDiff;

$htmlDiff = new HtmlDiff($oldHtml, $newHtml);
$content = $htmlDiff->build();

CSS Example

See https://github.com/caxy/php-htmldiff/blob/master/demo/codes.css for starter CSS you can use for displaying the HTML diff output.

Configuration

The configuration for HtmlDiff is contained in the Caxy\HtmlDiff\HtmlDiffConfig class.

There are two ways to set the configuration:

  1. Configure an Existing HtmlDiff Object
  2. Create and Use a HtmlDiffConfig Object

Configure an Existing HtmlDiff Object

When a new HtmlDiff object is created, it creates a HtmlDiffConfig object with the default configuration. You can change the configuration using setters on the object:

use Caxy\HtmlDiff\HtmlDiff;

// ...

$htmlDiff = new HtmlDiff($oldHtml, $newHtml);

// Set some of the configuration options.
$htmlDiff->getConfig()
    ->setMatchThreshold(80)
    ->setInsertSpaceInReplace(true)
;

// Calculate the differences using the configuration and get the html diff.
$content = $htmlDiff->build();

// ...

Create and Use a HtmlDiffConfig Object

You can also set the configuration by creating an instance of Caxy\HtmlDiff\HtmlDiffConfig and using it when creating a new HtmlDiff object using HtmlDiff::create.

This is useful when creating more than one instance of HtmlDiff:

use Caxy\HtmlDiff\HtmlDiff;
use Caxy\HtmlDiff\HtmlDiffConfig;

// ...

$config = new HtmlDiffConfig();
$config
    ->setMatchThreshold(95)
    ->setInsertSpaceInReplace(true)
;

// Create an HtmlDiff object with the custom configuration.
$firstHtmlDiff = HtmlDiff::create($oldHtml, $newHtml, $config);
$firstContent = $firstHtmlDiff->build();

$secondHtmlDiff = HtmlDiff::create($oldHtml2, $newHtml2, $config);
$secondHtmlDiff->getConfig()->setMatchThreshold(50);

$secondContent = $secondHtmlDiff->build();

// ...

Full Configuration with Defaults:

$config = new HtmlDiffConfig();
$config
    // Percentage required for list items to be considered a match.
    ->setMatchThreshold(80)
    
    // Set the encoding of the text to be diffed.
    ->setEncoding('UTF-8')
    
    // If true, a space will be added between the <del> and <ins> tags of text that was replaced.
    ->setInsertSpaceInReplace(false)
    
    // Option to disable the new Table Diffing feature and treat tables as regular text.
    ->setUseTableDiffing(true)
    
    // Pass an instance of \Doctrine\Common\Cache\Cache to cache the calculated diffs.
    ->setCacheProvider(null)

    // Disable the HTML purifier (only do this if you known what you're doing)
    // This bundle heavily relies on the purified input from ezyang/htmlpurifier
    ->setPurifierEnabled(true)
    
    // Set the cache directory that HTMLPurifier should use.
    ->setPurifierCacheLocation(null)
    
    // Group consecutive deletions and insertions instead of showing a deletion and insertion for each word individually. 
    ->setGroupDiffs(true)
    
    // List of characters to consider part of a single word when in the middle of text.
    ->setSpecialCaseChars(array('.', ',', '(', ')', '\''))
        
    // List of tags (and their replacement strings) to be diffed in isolation.
    ->setIsolatedDiffTags(array(
        'ol'     => '[[REPLACE_ORDERED_LIST]]',
        'ul'     => '[[REPLACE_UNORDERED_LIST]]',
        'sub'    => '[[REPLACE_SUB_SCRIPT]]',
        'sup'    => '[[REPLACE_SUPER_SCRIPT]]',
        'dl'     => '[[REPLACE_DEFINITION_LIST]]',
        'table'  => '[[REPLACE_TABLE]]',
        'strong' => '[[REPLACE_STRONG]]',
        'b'      => '[[REPLACE_B]]',
        'em'     => '[[REPLACE_EM]]',
        'i'      => '[[REPLACE_I]]',
        'a'      => '[[REPLACE_A]]',
    ))
    
    // Sets whether newline characters are kept or removed when `$htmlDiff->build()` is called.
    // For example, if your content includes <pre> tags, you might want to set this to true.
    ->setKeepNewLines(false)
;

Contributing

See CONTRIBUTING file.

Contributor Code of Conduct

Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms. See CODE_OF_CONDUCT file.

Credits

Did we miss anyone? If we did, let us know or put in a pull request!

License

php-htmldiff is available under GNU General Public License, version 2. See the LICENSE file for details.

TODO

  • Tests, tests, and more tests! (mostly unit tests) - need more tests before we can major refactoring / cleanup for a v1 release
  • Add documentation for setting up a cache provider (doctrine cache)
    • Maybe add abstraction layer for cache + adapter for doctrine cache
  • Make HTML Purifier an optional dependency - possibly use abstraction layer for purifiers so alternatives could be used (or none at all for performance)
  • Expose configuration for HTML Purifier (used in table diffing) - currently only cache dir is configurable through HtmlDiffConfig object
  • Performance improvements (we have 1 benchmark test, we should probably get more)
    • Algorithm improvements - trimming alike text at start and ends, store nested diff results in memory to re-use (like we do w/ caching)
    • Benchmark using DOMDocument vs. alternatives vs. string parsing
    • Consider not using string parsing for HtmlDiff in order to avoid having to create many DOMDocument instances in ListDiff and TableDiff
  • Benchmarking
  • Refactoring (but... tests first)
    • Overall design/architecture improvements
    • API improvements so a new HtmlDiff isn't required for each new diff (especially so that configuration can be re-used)
  • Split demo application to separate repository
  • Add documentation on alternative htmldiff engines and perhaps some comparisons