smashedtoatoms/the_fuzz

Argument Ordering Changes Jaro/Jaro Winkler Results

zolrath opened this issue · 3 comments

Hi!
I've been experimenting with using the_fuzz in a project at mine and I'm seeing that the implementation of Jaro seems to be incorrect.

The distance between two strings changes depending on the order they're provided to compare:

iex> TheFuzz.compare(:jaro, "hello", "heylo")
1.0

iex> TheFuzz.compare(:jaro, "heylo", "hello")
0.8666666666666667

These seem to be the only algorithms in the library which exhibit this issue.

Bummer. That means Jaro-Winkler is probably wrong too. I'll take a look and see if I can figure out what is happening.

Actually, Elixir standard lib has Jaro now. It has to be more efficient than mine, and it also has the added benefit of being correct. I'll likely just point mine to it.

Okay, it's updated. I just use the stdlib now, but return nil if an empty string is used for either argument to preserve backwards compatibility. Sorry about that.