automerge/automerge-repo

Lose extra diff

Opened this issue · 10 comments

We are currently generating change patches by calling Automerge.diff after each change. For an average change, this diff takes ~35ms.

While this isn't a spectacularly long period of time, the rest of the change can sometimes only take a single digit ms amount of time. If we could somehow get the patches from the patchCallback, this would likely speed up the whole transaction.

We could also get the change source at the same time.
Screenshot 2023-09-13 at 15 33 38

pvh commented

This is much much too slow! About 35ms too slow. Thanks for flagging it. Why do you suspect the patchCallback would be faster?

I have been doing some more experimenting with this and a couple of things came to light. Firstly, diff works much faster on newer documents with fewer changes. This makes sense as diff moves through all changes in the doc to create patches.

However, the patchCallback doesn't seem to take significant time at all. My understanding of this is that patches are effectively produced during each change to progress the document at the js level. They are effectively already produced, so little additional work is required. This was reflected in my tests too.

pvh commented

Okay, yeah, I seem to remember @orionz warning me about this. I'd moved to explicit diffs in anticipation of wanting to support branching / shutting back and forth on the timeline. I guess I'd underestimated the cost. We should fix this.

I'd like to get this fixed, but I could do with a little guidance.

Change events are emitted by the DocHandle state machine, comparing the latest doc state with the previous to determine if a change has occurred. If we keep this method, then the only way i can see to get patches into the change event is to store them in the state machine. Perhaps as lastPatches?

However, patches don't really feel like they belong in the state machine to me? The alternative would be to emit changes outside of the state machine, and only use xstate to track the internal state (loading, ready etc).

What do you think?

alexjg commented

Yeah patches feel like they shouldn't be part of the state machine, which is more concerned with the availability of the document. Can we have a DocHandle#lastPatches member field?

That almost feels as though it is still part of the state. To my mind patches only apply to the change they are emitted with

alexjg commented

I agree that it's awkward to hold the state on the DocHandle, but I think this is always going to be an interim solution before we get branches done. I don't think of patches as attached to a change (assuming by "change" you mean any set of changes whether generated locally or received over the network), I think of patches as associated with a tuple of (last patched heads, current heads), this captures the fact that the patch depends on what the state was before the change.

We currently don't have to track this "last patched heads" state anywhere because the onTransition function has access to the previous document to get the heads to diff from. To me then, adding lastPatches somewhere is equivalent to storing the previous doc on the context. Maybe that points to the way to implement this, as a lastPatches on the context of the state machine?

That's really what I was proposing in the first solution above. I'll knock up a prototype and see if it works for now.

pvh commented

Closing this out -- we've instead made the extra diff basically free.

Screenshot 2023-11-16 at 11 03 42
@pvh While I appreciate that the time has been wildly reduced since the initial report, diff is still taking ~10-15ms for me in most cases.

While not massive in the grand scheme of things, it is roughly 1/3rd of the time for a change which causes a render (the dropHandler in the flamechart) and significantly more (>70%) for the two small changes which follow.