Unexpected results from token_set_ratio()

Question

Unexpected results from token_set_ratio()

Closed this issue 4 years ago · 4 comments

I've been playing with the library today and am a bit confused by the behaviour of token_set_ratio(). Regardless of the token manipulation I would only expect a result of 100 if both strings were identical, but I also get 100 from the example below:

from fuzzywuzzy import fuzz

result = fuzz.token_set_ratio("word1 word2 word3", "word1 word2")

I would have expected that from partial_token_set_ratio() but not here, unless I've missed something.

Answer 1 · 2020-11-22T08:21:48.000Z

It is 100 as well when all words of one of the two strings appear in the other string

Answer 2 · 2020-11-22T17:22:39.000Z

Ah, thanks, how does that differ from partial_token_set_ratio()?

Answer 3 · 2020-11-22T18:34:32.000Z

Yes partial_token_set_ratio is based on partial_ratio instead of ratio and is already 100 when one word is similar.

fuzz.token_set_ratio("word1 word2 word3", "word1 word4")
# 71
fuzz.partial_token_set_ratio("word1 word2 word3", "word1 word4")
# 100

Answer 4 · 2020-11-22T18:44:39.000Z

Thanks for clarifying