xdrop/fuzzywuzzy

Incorrect levenshtein distance for completely edited strings

Closed this issue · 1 comments

When I calculate the ratio of "abcdef" - "fedcba" , it results in 17, even though I expected 0.

The ratio calculation is as I understand it: r = ( 1 - d/L)*100 ,
with d being the Levenshtein distance and L the sum of the two compared strings.

In this library the levenshtein distance is valued with 1 for each insert/delete and 2 for each replace.

The levenshtein distance in this library, for these two strings should be 12 (2 for each replace), resulting in a ratio = (1 - 12/12)*100 = 0

However, in your library, the ratio results in 17, instead of 0. This is because the distance it calculates is 10 instead of 12, resulting in (1-10/12)*100=17 .

This seems to be the case for string of any length, whith 100% replacements, as if 1 replacement is missed.

Never mind, it naturally calculates the minimum distance:

a b c d e f
  f e d c b a
1 2 2 0 2 2 1  = 10 

It does calculate a distance of 12 in abcdef -> uvwxyz . .

Can close this 'issue'