xdrop/fuzzywuzzy

How does this library handle upper and lower case?

rayliverified opened this issue · 2 comments

When comparing strings, the strings' capitalization affects the value returned. It appears this library is case sensitive. What are the parameters for CAPS vs lowercase? How much does the value decrease if a text such as "fuzzywuzzy" was matched with "FuZzYwUzZy" vs "fuzzywuzzy"?

Very curious!

xdrop commented

Yes, if no processing is done on the strings then uppercase and lowercase characters are essentially treated as different characters and that is why you'd get different results. (Levenshtein distance does not really consider these as being the same character)

For this reason default pre-processing (which transforms them to lowercase) is performed on input strings on all the ratios except the partial/simple ratios (which are the ones you are using) in case someone desires this functionality. I will consider adding an option though for simple / partial ratios to have pre-processing as well in the next version just to be consistent.

Thanks for the idea!

Thanks for the explanation! I am not that familiar with Levenshtein distance (adding to my reading list ;) so I did not know that capitalization matters haha! I do not think any code changes are necessary and a "Note: capitalization matters!" would suffice. Thanks again for the great work!