Finds degree of similarity between two strings, based on Dice's Coefficient, which is mostly better than Levenshtein distance.
This implementation actually treats multiple occurrences of a bigram as unique. The correctness of this behavior is most easily seen when getting the similarity between "GG" and "GGGGGGGG", which should obviously not be 1.
This is a PHP implemenation of the Node.js package string-similarity
Install using:
composer require henrik9999/string-similarity
In your code:
$stringSimilarity = new StringSimilarity();
$similarity = $stringSimilarity->compareTwoStrings("healed", "sealed");
$matches = $stringSimilarity->findBestMatch("healed", [
"edward",
"sealed",
"theatre",
]);
The package contains two methods:
Returns a fraction between 0 and 1, which indicates the degree of similarity between the two strings. 0 indicates completely different strings, 1 indicates identical strings. The comparison is case-sensitive by default.
- string1 (string): The first string
- string2 (string): The second string
- casesensitive (bool): If the comparison should be case-sensitive
Order does not make a difference.
(number): A fraction from 0 to 1, both inclusive. Higher number indicates more similarity.
$stringSimilarity->compareTwoStrings("healed", "sealed");
// → 0.8
$stringSimilarity->compareTwoStrings(
"Olive-green table for sale, in extremely good condition.",
"For sale: table in very good condition, olive green in colour."
);
// → 0.6060606060606061
$stringSimilarity->compareTwoStrings(
"Olive-green table for sale, in extremely good condition.",
"For sale: green Subaru Impreza, 210,000 miles"
);
// → 0.2558139534883721
$stringSimilarity->compareTwoStrings(
"Olive-green table for sale, in extremely good condition.",
"Wanted: mountain bike with at least 21 gears."
);
// → 0.1411764705882353
Compares mainString
against each string in targetStrings
.
- mainString (string): The string to match each target string against.
- targetStrings (array): Each string in this array will be matched against the main string.
- casesensitive (bool): If the comparison should be case-sensitive.
(Object): An object with a ratings
property, which gives a similarity rating for each target string, a bestMatch
property, which specifies which target string was most similar to the main string, and a bestMatchIndex
property, which specifies the index of the bestMatch in the targetStrings array.
$stringSimilarity->findBestMatch('Olive-green table for sale, in extremely good condition.', [
'For sale: green Subaru Impreza, 210,000 miles',
'For sale: table in very good condition, olive green in colour.',
'Wanted: mountain bike with at least 21 gears.'
]);
// →
array(3) {
["ratings"]=>
array(3) {
[0]=>
array(2) {
["target"]=>
string(45) "For sale: green Subaru Impreza, 210,000 miles"
["rating"]=>
float(0.2558139534883721)
}
[1]=>
array(2) {
["target"]=>
string(62) "For sale: table in very good condition, olive green in colour."
["rating"]=>
float(0.6060606060606061)
}
[2]=>
array(2) {
["target"]=>
string(45) "Wanted: mountain bike with at least 21 gears."
["rating"]=>
float(0.1411764705882353)
}
}
["bestMatch"]=>
array(2) {
["target"]=>
string(62) "For sale: table in very good condition, olive green in colour."
["rating"]=>
float(0.6060606060606061)
}
["bestMatchIndex"]=>
int(1)
}
- Made some perfomance improvements
- Initial Release