Installing python-Levenshtein as suggested by the warnings gives different results.
JeremyThiesen opened this issue · 1 comments
JeremyThiesen commented
I was running this code:
from fuzzywuzzy import fuzz
partial_ratio = fuzz.partial_ratio('more than fifty', 'i know that because a lion run fifty mile per hour and a cheetah run about eighty mile per hour and sixty-five be more than fifty and be slow than eighty')
print (partial_ratio)
At fuzzywuzzy version 0.18.0, it gives the answer of 100. It also gives the following user warning.
UserWarning: Using slow pure-python SequenceMatcher. Install python-Levenshtein to remove this warning
warnings.warn('Using slow pure-python SequenceMatcher. Install python-Levenshtein to remove this warning')
Installing python-Levenshtein at version 0.12.2, then gives the result answer of 87 for the preceeding code block, which is incorrect since there is an exact match.
maxbachmann commented
This issue has already been reported: #79
The implementation in python-Levenshtein provides incorrect results in some cases. So you can:
- use the slower difflib based version (and possibly suppress the warning)
- use the python-Levenshtein version which can provide incorrect results for any ratio which uses partial_ratio
- use RapidFuzz (I am the author) which provides a fast implementation providing similar results to the difflib based implementation
It would be possible to fix this behavior for fuzzywuzzy/python-Levenshtein. However since both projects are not really maintained anymore it is unclear if/when this will be fixed.