Mistake in diff_cleanupMerge algorithm in diff_match_patch.cpp
Closed this issue · 5 comments
GoogleCodeExporter commented
In diff_match_patch::diff_cleanupMerge the 'if (count_delete != 0 &&
count_insert != 0)' is always true because it is proceeded by 'while
(count_delete-- > 0)' & 'while (count_insert-- > 0)'.
I think the test was supposed to be before the variables were decremented.
Luckily (or not) this only results in consuming extra cycles.
This same bug appears in the Java version, but not in the C#, JavaScript, Lua
or Python versions.
Patch for C++ attached.
Original issue reported on code.google.com by chris...@gmail.com
on 4 Nov 2010 at 1:36
Attachments:
GoogleCodeExporter commented
Nice catch! It looks like that bug has been in the code since the initial Java
version in 2007, and was later faithfully ported to C++. This function has
comprehensive unit testing, but no amount of testing would have caught that.
While in that function I also noticed that we are rebuilding [Equality,
Insertion, Equality] diffs for no reason. So I've added a shortcut for
single-edits sandwiched between two equalities.
The patch has gone out for overnight review.
Original comment by neil.fra...@gmail.com
on 5 Nov 2010 at 7:38
- Changed state: Started
- Added labels: Performance
GoogleCodeExporter commented
The changes were commited to Subversion a few hours ago.
http://code.google.com/p/google-diff-match-patch/source/detail?r=72
Thanks for caching this.
There's another small optimization coming on Monday for C# and JavaScript, so I
won't bother updating the download package until then.
Original comment by neil.fra...@gmail.com
on 6 Nov 2010 at 6:21
- Changed state: Fixed
GoogleCodeExporter commented
Neil,
Unfortunately your change in r72 appears to cause test "diff_main: Overlap #1."
to fail.
It's the 'if (count_delete + count_insert > 1)' which is not equivalent to
the old 'if (count_delete != 0 || count_insert != 0)'.
Original comment by chris...@gmail.com
on 6 Nov 2010 at 6:01
GoogleCodeExporter commented
I cannot replicate the error you report. All unit tests pass in all languages:
diff_main: Overlap #1. OK
diff_main: Overlap #2. OK
diff_main: Overlap #3. OK
[...]
All tests passed.
Total time: 1297 ms
Done.
Could you double-check that the error is real?
Original comment by neil.fra...@gmail.com
on 8 Nov 2010 at 8:36
GoogleCodeExporter commented
Neil,
I'm terribly sorry. You are absolutely correct - r72 does NOT cause a test
failure.
Let me explain: I'd made a change that asserted that `text_delete` &
`text_insert` were already empty at the bottom of the `case` statement & so
there was no need for `text_delete = ""; text_insert = "";`. This was true
before r72. Unfortunately I neglected to test my debug build with that change
from r72 so the asserts weren't triggered.
Why did I make that change & why did I not suspect it?
Well, it's because I don't have/can't use Qt & I've ported the C++
diff-match-patch to use the standard C++ library, so it is just one of many
changes. I hope to submit my C++ version (cpp-std) to you soon for possible
inclusion in diff-match-patch.
In fact I'm quite pleased that that change in r72 does work as it provides a
nice performance improvement.
I've spent quite some time optimising my C++ conversion & it is now up to 4
times as fast as the initial conversion.
And, of course it passes all the tests. I've also added a couple of switches
to the test program to provide some performance metrics.
Additionally, I've added another switch (currently not for submission) that
processes a diff/patch file to produce HTML. It's based on the JavaScript
version in svnX <svnx.googlecode.com> .
[It converts a 91,500 line diff (from a user) to 29.2MB of HTML in as little as
14 secs, even on my ageing machine.]
My conversion is pretty much complete, I just need to do some tidying up & test
it with one more compiler.
I'll send you more details when it's ready.
Thanks for a very useful resource.
Regards,
Chris
Original comment by chris...@gmail.com
on 9 Nov 2010 at 6:50