Incompatibility with the Python version in handling underscores
DoronRippel opened this issue · 3 comments
DoronRippel commented
The FuzzySearch.tokenSetPartialRatio() method returns different results than the Python version for strings that contain underscore.
Examples:
- FuzzySearch.tokenSetPartialRatio("worm_mikeala", "mikeala rath") returns 74 while the Python version returns 58
- FuzzySearch.tokenSetPartialRatio("c_wasyluka", "crystal wasyluka") returns 100 while the Python version returns 80
xdrop commented
Thanks, I have pushed a commit to fix this. This will be fixed in 1.3.4
DoronRippel commented
Great!
…On Tue, Jan 18, 2022, 20:10 Panayiotis ***@***.***> wrote:
Thanks, I have pushed a commit to fix this. I'll release this in a new
version.
—
Reply to this email directly, view it on GitHub
<#97 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AIFAZH4HGLJGPEYZZFOLWX3UWWUJTANCNFSM5MGKESNQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you authored the thread.Message ID:
***@***.***>
Click here
<https://www.mailcontrol.com/sr/X64e8mDbA17GX2PQPOmvUrikelaf-VTnN0AhUdFzSBrPGv9reHPJB2zfum3JrqdEpXQPnJCKWWNqvfHxypKb0g==>
to report this email as spam.
DoronRippel commented
Hi Panayiotis,
I saw that there was a new version 1.3.4, so I assumed that this is where
you fixed the issue and I used it, but the issue is not only not fixed -
all the examples now return 100...
Here is how I run them in Java:
System.out.println("expected 58 -> got " +
FuzzySearch.tokenSetPartialRatio("worm_mikeala", "mikeala rath"));
System.out.println("expected 80 -> got " +
FuzzySearch.tokenSetPartialRatio("c_wasyluka", "crystal wasyluka"));
System.out.println( "expected 78 -> got " +
FuzzySearch.tokenSetPartialRatio("a_bacdefg", "crystal bacdefg"));
I get:
expected 58 -> got 100
expected 80 -> got 100
expected 78 -> got 100
and here is how I run them in Python:
from fuzzywuzzy import fuzz
if __name__ == '__main__':
print(fuzz.partial_token_set_ratio("worm_mikeala", "mikeala rath"))
print(fuzz.partial_token_set_ratio("c_wasyluka", "crystal wasyluka"))
print(fuzz.partial_token_set_ratio("x_bacdefg", "crystal bacdefg"))
I get:
58
80
78
Thank you,
Doron
…On Tue, Jan 18, 2022 at 8:10 PM Panayiotis ***@***.***> wrote:
Thanks, I have pushed a commit to fix this. I'll release this in a new
version.
—
Reply to this email directly, view it on GitHub
<#97 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AIFAZH4HGLJGPEYZZFOLWX3UWWUJTANCNFSM5MGKESNQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you authored the thread.Message ID:
***@***.***>
Click here
<https://www.mailcontrol.com/sr/X64e8mDbA17GX2PQPOmvUrikelaf-VTnN0AhUdFzSBrPGv9reHPJB2zfum3JrqdEpXQPnJCKWWNqvfHxypKb0g==>
to report this email as spam.