Sub-optimal output from diff
Closed this issue · 3 comments
GoogleCodeExporter commented
What steps will reproduce the problem?
1. Ran the following Python code:
from diff_match_patch import diff_match_patch
dmp = diff_match_patch()
text1="A.A.F.z.z"
text2="x.x.F.A.A.F"
diff = dmp.diff_main( text1, text2, False )
print diff
What is the expected output? What do you see instead?
Expected:
[(1, 'x.x.F.'), (0, 'A.A.F'), (-1, '.z.z')]
Actual:
[(-1, 'A'), (1, 'x'), (0, '.'), (-1, 'A'), (1, 'x'), (0, '.F.'), (-1, 'z'), (1, 'A.A'), (0, '.'), (-1, 'z'), (1, 'F')]
What version of the product are you using? On what operating system?
Latest diff-match-patch checked out from SVN (rev 97). Mac OS X.
Please provide any additional information below.
The expected output seems easier to read. The actual output has the same length
D-path (5) as the expected output and Myers' algorithm seems to be finding that
one first. So I'm not sure if this is a bug in diff-match-patch or a
performance trade-off?
Original issue reported on code.google.com by m...@dixon.se
on 3 Nov 2011 at 1:19
GoogleCodeExporter commented
Behaviour confirmed. As you pointed out, the two options are tied as the
shortest possible solution. In this case it happens it finds the ugly one
first.
If you want an output that is easy to read, the intent is to pass the raw diff
to diff_cleanupSemantic() which has some reasonably effective heuristics to
clean up the results. However in this case it fails to detect the commonality.
It is worth noting that if one reverses text1 and text2 it behaves perfectly.
The issue appears to be an oversight in diff_cleanupSemantic(). Currently it
checks for this case:
# Find any overlaps between deletions and insertions.
# e.g: <del>abcxxx</del><ins>xxxdef</ins>
# -> <del>abc</del>xxx<ins>def</ins>
The case of
<del>xxxabc</del><ins>xxxdef</ins>
and
<del>abcxxx</del><ins>defxxx</ins>
is already handled by the diff algorithm itself.
What's missing is this heuristic:
<del>xxxabc</del><ins>defxxx</ins>
-> <ins>def</ins>xxx<del>abc</del>
Working on it...
[Nice bug report BTW. You know your stuff.]
Original comment by neil.fra...@gmail.com
on 3 Nov 2011 at 6:13
- Changed state: Started
GoogleCodeExporter commented
Code complete. Out for review...
Original comment by neil.fra...@gmail.com
on 4 Nov 2011 at 2:06
GoogleCodeExporter commented
...complete. Thanks for pointing out this sub-optimal output!
Original comment by neil.fra...@gmail.com
on 9 Nov 2011 at 9:33
- Changed state: Fixed