java-json-tools/json-patch

Reduce `deepCopy` calls

Opened this issue · 7 comments

When there are many operations to run successively and when the document is large performing deepCopy for each operation becomes extremely expensive.

Below is an example of the change I'm talking about with only 'replace':

diff --git a/src/main/java/com/github/fge/jsonpatch/JsonPatch.java b/src/main/java/com/github/fge/jsonpatch/JsonPatch.java
index 178ab86..9f0a3d4 100644
--- a/src/main/java/com/github/fge/jsonpatch/JsonPatch.java
+++ b/src/main/java/com/github/fge/jsonpatch/JsonPatch.java
@@ -142,7 +142,7 @@ public final class JsonPatch
         throws JsonPatchException
     {
         BUNDLE.checkNotNull(node, "jsonPatch.nullInput");
-        JsonNode ret = node;
+        JsonNode ret = node.deepCopy();
         for (final JsonPatchOperation operation: operations)
             ret = operation.apply(ret);
 
diff --git a/src/main/java/com/github/fge/jsonpatch/ReplaceOperation.java b/src/main/java/com/github/fge/jsonpatch/ReplaceOperation.java
index 72405a6..fca2fab 100644
--- a/src/main/java/com/github/fge/jsonpatch/ReplaceOperation.java
+++ b/src/main/java/com/github/fge/jsonpatch/ReplaceOperation.java
@@ -69,7 +69,7 @@ public final class ReplaceOperation
         final JsonNode replacement = value.deepCopy();
         if (path.isEmpty())
             return replacement;
-        final JsonNode ret = node.deepCopy();
+        final JsonNode ret = node;
         final JsonNode parent = path.parent().get(ret);
         final String rawToken = Iterables.getLast(path).getToken().getRaw();
         if (parent.isObject())

Instead of performing the copy in each operation it is performed once at the top level. The above will fail tests since they expect ReplaceOperation to copy the node. Beyond that are there any edge cases this type of change would break? Are there other ways to reduce copying? Thanks.

jayanderson@f6ecd0d - one possibility for a fix. This keeps the apply api the same (performing a deepCopy) while adding a applyMutating which can be called internal when a copy isn't necessary.

jayanderson@f6ecd0d has a huge effect on performance. With a 440 kB patch, containing around 2,000 changes, applied to a 3.5 MB model (both minified), a test program using the unpatched version of the library takes 22 seconds to run on my computer. With the patched version of the library, the entire run takes 20 milliseconds. For comparison, the zjsonpatch library takes 37 milliseconds to run. All three runs produced identical output.

I'm down to merge something here, but I'm not sure if I understand why the deepCopy was implemented in the first place and I'm somewhat fearful of introducing a regression for existing users.

Looks like the @jayanderson suggestion could be workable.

Any chance you can submit it as a PR? (Using applyMutable)

My experience with this library is limited, so you may want to take this with a grain of salt, but the patch looks safe to me. Instead of needlessly cloning the entire tree once for every visited node (which is very computationally expensive), jayanderson/json-patch@f6ecd0d clones the tree once and then mutates this copy in-place during the patch operation.

Making a single copy of the tree in JsonPatchOperation is prudent, though, because applying a patch may fail, in which case an instance of JsonPatchOperation is thrown. Had a defensive copy of the tree not been made, the tree would have been left in an inconsistent state. A patch should, in my opinion, either be applied in its entirety or not be applied at all.

For what it's worth, we've been running this patch in production for ten days now with no ill effects.

An update on my last comment: we've now been running the patch in production for almost three months and everything works well, with substantially better performance.