What is the max possible value (upper bound) for fuzz.ratio?
sillybun opened this issue · 4 comments
It would be helpful to know what is the max possible value (upper bound) for:
where the length of
It seems that
fuzz.ratio is a normalized version of the InDel-Distance (similar to Levenshtein but without Substitutions) scaled to the range 0-100:
round(100 * (1 - InDelDist / (len1 + len2)))
so the upper bound is 100
Rereading your question I think you might mean a length based similarity score which is a upper bound for the similarity. Both for Levenshtein and InDel Distance the distance between two strings is at least the length difference, so in your example with
len1 <= len2
the upper bound can be calculated as:
100 * (1 - (len2 - len1) / (len1 + len2))
Thanks very much! It helps a lot!