rapidfuzz/JaroWinkler

-1 and -2 are treated as equivalent

Closed this issue · 1 comments

equivalence in JaroWinkler / RapidFuzz is implemented in terms of the hash of an element. For this purpose the Python hash function is used. However this implementation uses the value -1 to indicate an error and therefore has the same hash for -1 and -2:

hash(-1) == hash(-2) == -2

this leads to:

>>> from jarowinkler import jaro_similarity
>>> jaro_similarity([0, -1], [0, -2])
1.0

We do not need this as error value and should implement the hash function as:

def rapidfuzz_hash(x):
    if x == -1:
        return -1
    return hash(x)

@orsinium this came up here: https://cloud.drone.io/life4/textdistance/53/1/8

This is fixed in the latest releases