DiscountedLevenshtein can be less than Levenshtein....?

Question

DiscountedLevenshtein can be less than Levenshtein....?

chrislit opened this issue 5 years ago · 4 comments

from abydos.distance import *
lev = Levenshtein()
dlev = DiscountedLevenshtein()
lev.dist('cat', 'hat') < dlev.dist('cat', 'hat')

Is this correct, though?

Answer 1 · 2019-07-03T20:42:08.000Z

Also, this alignment seem sub-optimal. (I think the l in Neil should be matched with an l in Niall.)

cmp.alignment('Niall', 'Neil')
(2.526064024369237, 'N-iall', 'Neil--')

Answer 2 · 2019-08-05T19:51:00.000Z

fixed alignment issue in b04ca90

Answer 3 · 2020-01-07T20:49:49.000Z

This is a result of the normalizing term in combination with the discounting function. It's worth re-examining this issue to determine if the supplied discounting functions are good, but it's not a bug.

Answer 4 · 2022-02-26T11:19:09.000Z

Do you know of any code example of using abydos for matching two Python string lists by calculating minimal distances?

longRefList = ["Name 0001", "Name 0002", ... "Name 9999"]
mylist = ["Name 2345", "xdsdfj ABCD", "Name x23f"] 
# ... whatever code to calculate, 
# for each item in list 2, the distance & position of closest item in list 1 
# ... to output something like this:
matchOutput = [
    {"dist":0, "position":2344}, 
    {"dist":0.999, "position": 8831}, 
    {"dist":0.5, "position":230}
]

I am particularly interested in using ReesLevenshtein distance. But I wonder how slow could this be.
Do you know if somebody has tried to use abydos for trying to merge pandas dataframes by minimal distance matching between two columns?

Thanks a lot in advance for your advice.
@abubelinha