athenianco/names-matcher

Comparing strings finds an incorrect match

dimiterbak opened this issue · 2 comments

Hi There,

Thanks for sharing this library!

I am running the bellow test and expect to find no match.
However, it returns a match:

import numpy as np
import unittest

from names_matcher.algorithm import NamesMatcher

class TestCreateIdentityMatcher(unittest.TestCase):

    def test_compare_different_identities(self):

        names_1 = [["V", "v"]]
        names_2 = [["L", "o"]]

        assignments = NamesMatcher()(names_1,
                                     names_2)

        self.assertEqual(-1,
                             assignments[0][0])
        self.assertEqual(1,
                             assignments[1][0])

Hi @dimiterbak, your code returns the following for me:

(array([0], dtype=int32), array([0.]))

As you see, the confidence of the match is 0, which is the minimum confidence possible.
I always return the matches, and it depends on the domain problem to choose the perfect threshold for the confidence. It depends on what's more important, Precision vs. Recall, etc., etc.

Thank you!

It looks like I misunderstood by thinking you return the distance but it was the confidence.