editops missing replace operations in some cases
danylofitel opened this issue · 2 comments
Sample strings:
TCTTTGGAGCACAAAACCAGTTGAAACATCAAATTCGTTTGATGTACTGAAGTCAGAGGACGCGCAGGGA
TCTTTGGAGCACAAAACCAGTTGAAACATCATTATTCCTTCGTTTGATGTACTGAAGTCAGAGGACGCGCAGGGA
In this case TTATT was inserted and AA was replaced with CC, but editops only returns one replace of C to A.
Expected output:
[('insert', 31, 31), ('insert', 31, 32), ('insert', 32, 34), ('insert', 32, 35), ('insert', 32, 36), ('replace', 32, 37), ('replace', 33, 38)]
Actual output:
[('insert', 31, 31), ('insert', 31, 32), ('insert', 32, 34), ('insert', 32, 35), ('insert', 32, 36), ('replace', 32, 37)]
I do not see anything incorrect in this output. The Levenshtein distance between these two strings is 6 (similar to the count of editops). Why should we insert an A and then replace an A. The editops describe the following transformation:
AA
TTATTCC
2 insertions in the beginning
TTAA
TTATTCC
3 Insertions after the first A
TTATTCA
TTATTCC
Replacement of the last A
TTATTCC
TTATTCC
Note that editops will just return one of multiple optimal paths.
My bad, you're right, the output is completely correct. I was actually looking at the output formatted differently (nothing to do with the library, custom code), the bug was actually in that formatting. Thanks for checking this! I'll close the issue.