-1 and -2 are treated as equivalent
Closed this issue · 1 comments
maxbachmann commented
equivalence in JaroWinkler / RapidFuzz is implemented in terms of the hash of an element. For this purpose the Python hash function is used. However this implementation uses the value -1 to indicate an error and therefore has the same hash for -1
and -2
:
hash(-1) == hash(-2) == -2
this leads to:
>>> from jarowinkler import jaro_similarity
>>> jaro_similarity([0, -1], [0, -2])
1.0
We do not need this as error value and should implement the hash function as:
def rapidfuzz_hash(x):
if x == -1:
return -1
return hash(x)
@orsinium this came up here: https://cloud.drone.io/life4/textdistance/53/1/8
maxbachmann commented
This is fixed in the latest releases