mjmwired/google-diff-match-patch

The bool vector return of patch_apply generally does not correspond to anything the caller knows about.

Opened this issue · 2 comments

Before I start, thanks for a creating a great library. It's very useful and is 
much appreciated.

Using the C++ version I came across what seems like either very odd or just 
plain broken behaviour with diff_match_patch::patch_apply. It returns a vector 
of bools to indicate individual patch success but the size of this vector can 
be larger then the size of the patch list that is input to the function.

It appears that the list of input patches can be broken up into a larger list 
by patch_splitMax. Despite patch_apply passing the patch list as a reference it 
takes a deep copy of it and therefore does not alter it. This means that the 
vector of bools is with reference to the internal, in general larger, patch 
list and is therefore of little use to the caller. In fact it's potentially 
dangerous if one tries to index into the input patch list to, for example, 
print out the patches that failed (as I did).

Either the patch list really should be altered on return or the bool vector 
should correspond to the original input list (and the patch list passed by 
reference-to-const).

Whether this is an issue with the other implementations or is confined to the 
C++/Qt version I don't know.

Original issue reported on code.google.com by fotherin...@gmail.com on 19 Jan 2012 at 9:28

I agree completely.  There's a note in the API page:

[Note that this second element is not too useful since large patches may get 
broken up internally, resulting in a longer results list than the input with no 
way to figure out which patch succeeded or failed. A more informative API is in 
development.]

One approach would be to collapse the booleans down to a list of the same 
length as the input list of patches.  Thus all the booleans relating to a 
split-up patch would be ORed together and would be false if any of them failed. 
 This should be relatively simple, and doesn't change the return types (all 
hell breaks loose if we need to change the API types for this library).

The downside is that one can't distinguish between partial failure and complete 
failure.  Don't know if that's important.  Tell me about your use case.

Original comment by neil.fra...@gmail.com on 20 Jan 2012 at 12:22

  • Changed state: Accepted
Thanks for the reply. I don't really need to know which patch failed for my
application so it's not a big issue for me. I was only printing the failed
patch to a log file for later perusal, the app. gives up and moves on.

Shrinking the vector to match the input list sounds ideal to me as it's
what I'd expect the function to do. The splitting of a patch is an internal
implementation detail for that method so I can't see why this would be
exposed.

Original comment by fotherin...@gmail.com on 21 Jan 2012 at 1:00