Unexpected results from token_set_ratio()
Closed this issue · 4 comments
timdagit commented
I've been playing with the library today and am a bit confused by the behaviour of token_set_ratio()
. Regardless of the token manipulation I would only expect a result of 100 if both strings were identical, but I also get 100 from the example below:
from fuzzywuzzy import fuzz
result = fuzz.token_set_ratio("word1 word2 word3", "word1 word2")
I would have expected that from partial_token_set_ratio() but not here, unless I've missed something.
maxbachmann commented
It is 100 as well when all words of one of the two strings appear in the other string
timdagit commented
Ah, thanks, how does that differ from partial_token_set_ratio()
?
maxbachmann commented
Yes partial_token_set_ratio
is based on partial_ratio
instead of ratio
and is already 100 when one word is similar.
fuzz.token_set_ratio("word1 word2 word3", "word1 word4")
# 71
fuzz.partial_token_set_ratio("word1 word2 word3", "word1 word4")
# 100
timdagit commented
Thanks for clarifying