Comparing strings finds an incorrect match
dimiterbak opened this issue · 2 comments
dimiterbak commented
Hi There,
Thanks for sharing this library!
I am running the bellow test and expect to find no match.
However, it returns a match:
import numpy as np
import unittest
from names_matcher.algorithm import NamesMatcher
class TestCreateIdentityMatcher(unittest.TestCase):
def test_compare_different_identities(self):
names_1 = [["V", "v"]]
names_2 = [["L", "o"]]
assignments = NamesMatcher()(names_1,
names_2)
self.assertEqual(-1,
assignments[0][0])
self.assertEqual(1,
assignments[1][0])
vmarkovtsev commented
Hi @dimiterbak, your code returns the following for me:
(array([0], dtype=int32), array([0.]))
As you see, the confidence of the match is 0, which is the minimum confidence possible.
I always return the matches, and it depends on the domain problem to choose the perfect threshold for the confidence. It depends on what's more important, Precision vs. Recall, etc., etc.
dimiterbak commented
Thank you!
It looks like I misunderstood by thinking you return the distance but it was the confidence.