Find most similar
it-is-hacker-time opened this issue · 2 comments
What algoritm should I use to find the closest match from a string to a set of strings.
Example of known inputs:
I would like a cheese pizza
I would like a cheese pizza with onions
I would like a cheese pizza without onions
Input I wanna match up and find most similiar, in case there are any similar (in this example there are just spelling mistakes):
I would like a ceese pizza with out onnions.
I recommend using the cosine similarity algorithm.
$text = []
$text[] = tokenize("I would like a cheese pizza");
$text[] = tokenize("I would like a cheese pizza with onions");
$text[] = tokenize("I would like a cheese pizza without onions");
$compareAgainst = tokenize("I would like a ceese pizza with out onnions.")
$bestScore = 0;
$bestIdx = 0;
$compare = new CosineSimilarityComparison();
foreach($text as $index => $t)
{
$score = $compare->similarity($t, $compareAgainst);
if($score > $best) {
$best = $score;
$bestIdx = $index;
}
}
echo "best match {$text[$bestIdx]}";
The same code with some corrections:
`require_once('vendor/autoload.php');
use TextAnalysis\Comparisons\CosineSimilarityComparison;
$text = [];
$text[]= "I would like a cheese pizza";
$text[] = "I would like a cheese pizza with onions";
$text[] = "I would like a cheese pizza without onions";
$compareAgainst = tokenize("I would like a ceese pizza with out onnions.");
//$bestScore = 0;
$best = 0;
$bestIdx = 0;
$compare = new CosineSimilarityComparison();
foreach($text as $index => $t)
{
$t=tokenize($t);
$score = $compare->similarity($t, $compareAgainst);
if($score > $best) {
$best = $score;
$bestIdx = $index;
}
}
echo "best match {$text[$bestIdx]}";
`