seatgeek/fuzzywuzzy

Unexpected results from token_set_ratio()

Closed this issue · 4 comments

I've been playing with the library today and am a bit confused by the behaviour of token_set_ratio(). Regardless of the token manipulation I would only expect a result of 100 if both strings were identical, but I also get 100 from the example below:

from fuzzywuzzy import fuzz

result = fuzz.token_set_ratio("word1 word2 word3", "word1 word2")

I would have expected that from partial_token_set_ratio() but not here, unless I've missed something.

It is 100 as well when all words of one of the two strings appear in the other string

Ah, thanks, how does that differ from partial_token_set_ratio()?

Yes partial_token_set_ratio is based on partial_ratio instead of ratio and is already 100 when one word is similar.

fuzz.token_set_ratio("word1 word2 word3", "word1 word4")
# 71
fuzz.partial_token_set_ratio("word1 word2 word3", "word1 word4")
# 100

Thanks for clarifying