/SimHashPhp

SimHashPhp is a PHP 5.3 library to use the SimHash algorithm.

Primary LanguagePHPMIT LicenseMIT

SimHashPhp

What is SimHashPhp ?

SimHashPhp is a PHP library that port the SimHash algorithm created by Moses Charikar. This algorithm provide an efficient way to compare two texts or two data sets.

See "SimHash or the way to compare quickly two datasets" for more informations.

Build Status

How to use it ?

The library use PSR-0 naming convention so you can easily use it with an autoloader. However, basically, you can use a simple require:

<?php

require 'Leg/SimHash/SimHash.php';
require 'Leg/SimHash/SimHashFactory.php';

use Leg\SimHash\SimHash;
use Leg\SimHash\SimHashFactory;


$factory = new SimHashFactory();

$fingerprint1 = $factory->run('Mary-jane is very tall. She was in the 9th grade.');
$fingerprint2 = $factory->run('John is in high school. He is not so tall.');

// This method return a float between 0 and 1
// where 0 is completely different strings and
// 1 is completely equals strings.
$fingerprint1->compareWith($fingerprint2);

License

This library is under the MIT license (see LICENSE.md)

About

The SimHashPhp is mainly developed by Titouan Galopin.

Reporting an issue or a feature request

Issues and feature requests are tracked in the Github issue tracker.