Low score when diff b/w 2 strings is large
Opened this issue · 3 comments
Zaky7 commented
Hi,
I am using thefuzz for a fuzzy matching set of strings but I don't understand why it gives a low score for "Meta Plate form" for query meta
from thefuzz import fuzz
from thefuzz import process
choices = ["Meta Platforms Inc Class a Common stock",
"Meta Financial Group, Inc. Common Stock",
"Metals Acquisition Corp",
"Metacrine, Inc. Common Stock",
"Metalla Royalty & Streaming Ltd.",
"Meta Materials Inc. Common Stock",
"Metals Acquisition Corp Units, each consisting of one Class A ordinary share and one-third of one re"
]
res = process.extract("Meta", choices, limit=50)
print(res)
Output
[('Metals Acquisition Corp', 90), ('Metacrine, Inc. Common Stock', 90), ('Metalla Royalty & Streaming Ltd.', 90), ('Meta Materials Inc. Common Stock', 90), ('Meta Platforms Inc Class a Common stock', 60), ('Meta Financial Group, Inc. Common Stock', 60), ('Metals Acquisition Corp Units, each consisting of one Class A ordinary share and one-third of one re', 60)]
Zaky7 commented
After checking the code, I realized the algorithm also gives weightage to the size of two strings.
if I add "Meta Platforms" in the choices it found it with score 90
purplecrow2020 commented
so ideally we also face the same issue at times giving penalisation to the larger strings on tie breaker