seperman/deepdiff

Wrong diff on list of strings

Closed this issue · 3 comments

Describe the bug
With certain specific data, DeepDiff gives a wrong result (see below for more details).
The issue seems to be related to repeated elements in the input, because even just small changes from the sequences below won't trigger the bug.

To Reproduce

from pprint import pprint
from deepdiff import DeepDiff

l1 = "A B C D E F G D H".split()
l2 = "B C X D H Y Z".split()
pprint(DeepDiff(l1, l2))

Result:

{'iterable_item_removed': {'root[0]': 'A', 'root[4]': 'E'},
 'values_changed': {'root[2]': {'new_value': 'X', 'old_value': 'D'},
                    'root[5]': {'new_value': 'Y', 'old_value': 'F'},
                    'root[6]': {'new_value': 'Z', 'old_value': 'G'}}}

Problems with this:

  • indexes in the values_changed key are not consistent: root[2] old value is not D, unless you take it after removing, but root[5] old value is F only if you take it before removing.
  • Applying the diff (with some "interpretation" of the indexes) does not transform l1 into l2. We obtain B C X Y Z D H instead of B C X D H Y Z

Expected behavior
Indexes in the values_changed section should be consistent, and applying the diff on l1 should produce l2

OS, DeepDiff version and Python version (please complete the following information):

  • OS: Any
  • Python Version: 3.10.7
  • DeepDiff Version: 6.7.1

Additional context
Note that changes to l1 or l2 will most probably produce a correct result.
This was the smallest example I could find that was reproducing the error.

I also occurred this error

Hi @rliffredo and @petalsofcherry
Thanks for reporting the bug. I am looking into it.

  1. I found the bug that reported D with the wrong index. That was easy to fix and I will push the changes soon.
  2. The problem with delta is that in order to recreate your l2 in the same exact order it was, some operations need to happen in a specific order one after the other. DeepDiff doesn't currently hold that information about the order of operations. It just buckets all the removals together vs. all the additions etc. So I will need to make DeepDiff keep that metadata and pass it to Delta.