Q: edit distance between 2 lists
yasheshgaur opened this issue · 4 comments
Can some one please tell me the API to calculate the editdistance for 2 lists? I only see the API for 2 strings and in my use case, it is not possible to convert the list into strings.
I am not sure your problem is well defined yet. Are you wanting to compare the elements of your list componentwise and then report the sum? What would an example look like?
@rljacobson I believe that is what he wants. For example sometimes I need to apply Levenshtein editops on word level rather than char level sequences in which case having the words as element to a list and passing that off to Levenshtein would be helpful.
I currently have a very janky way of doing this by mapping the unique elements of my lists onto a unique char then passing that to the Levenshtein algorithm. The problem with this method is that there is a limited number of chars but there can be an unlimited number of unique elements.
def Levenshtein_editops_list(source, target):
unique_elements = sorted(set(source + target))
char_list = list('abcdefghijklmnopqrstuvwxyz0123456789')
if len(unique_elements) > len(char_list):
raise Exception("too many elements")
else:
unique_element_map = {ele:char_list[i] for i, ele in enumerate(unique_elements)}
source_str = ''.join([unique_element_map[ele] for ele in source])
target_str = ''.join([unique_element_map[ele] for ele in target])
transform_list = Levenshtein.editops(source_str, target_str)
return transform_list
target = 'The Cat ate the moon'.split()
source = "The dog jumped over the moon".split()
print(Levenshtein_editop_list(source, target))
output
[('delete', 1, 1), ('replace', 2, 1), ('replace', 3, 2)]
@kkawabat Thanks for your function. I made a little bit change and it then works perfectly in my case.
def levenshtein_editops_list(source, target):
unique_elements = sorted(set(source + target))
char_list = [chr(i) for i in range(len(unique_elements))]
if len(unique_elements) > len(char_list):
raise Exception("too many elements")
else:
unique_element_map = {ele:char_list[i] for i, ele in enumerate(unique_elements)}
source_str = ''.join([unique_element_map[ele] for ele in source])
target_str = ''.join([unique_element_map[ele] for ele in target])
transform_list = Levenshtein.editops(source_str, target_str)
return transform_list
Given my inability to actively maintain this repo for the foreseeable future and that a solution exists, I am inclined to close this issue—unless someone else wants to work on it.