yorkie-team/yorkie

Incorrect TreeChange Generation in Concurrent Editing

hackerwins opened this issue · 1 comments

Description:

Incorrect TreeChange Generation in Concurrent Editing.

The current implementation of TreeChange in Tree does not appropriately consider the actual value changes in the existing Tree when generating TreeChange for received ops. Currently, it converts the ops without considering the actual modifications to Tree.

For example:

  it('treechange in concurrent editing test', async function ({ task }) {
    await withTwoClientsAndDocuments<{ t: Tree }>(async (c1, d1, c2, d2) => {
      // 01. initialize a document with '<doc><p>abcd</p></doc>'.
      d1.update((root) => {
        root.t = new Tree({
          type: 'doc',
          children: [{ type: 'p', children: [{ type: 'text', value: 'abcd' }] }],
        });
      });
      await c1.sync();await c2.sync();await c1.sync();

      // 02. collect operations from d1 and d2.
      const actualOps1 = [];
      d1.subscribe('$.t', (event) => {
        if (event.type === 'local-change' || event.type === 'remote-change') {
          const { operations } = event.value;
          actualOps1.push(type, ...operations.filter((op) => op.type === 'tree-edit'));
        }
      });
      const actualOps2 = [];
      d2.subscribe('$.t', (event) => {
        if (event.type === 'local-change' || event.type === 'remote-change') {
          const { operations } = event.value;
          actualOps2.push(type, ...operations.filter((op) => op.type === 'tree-edit'));
        }
      });

      // 03. update document concurrently.
      // d1: <doc><p>abcd</p></doc> -> <doc></doc>
      d1.update((root) => root.t.edit(0, 6));
      // d2: <doc><p>abcd</p></doc> -> <doc><p>ab123cd</p></doc>
      d2.update((root) => root.t.edit(3, 3, { type: 'text', value: '123' }));
      
      // c1 should not receive the operation from c2, because the parent is already deleted in d1.
      // actualOps1:
      //   {
      //     type: "tree-edit",
      //     path: "$.t",
      //     from: 0, to: 0,
      //     fromPath: [0], toPath: [0],
      //     value: [{ type: "text", value: "123" }],
      //     splitLevel: 0,
      //   }
      await c1.sync();await c2.sync();await c1.sync();
    }, task.name);

In the example, while d1 is deleting <p>, and d2 is adding a TextNode "123" to the <p>, the final sync incorrectly triggers TextChange in d1 for the added TextNode, even though the corresponding parent node has been deleted.

This issue is related to the following TODO in the code:
https://github.com/yorkie-team/yorkie-js-sdk/blob/5ef1e8ff13cde39053d86fa667dc7afb857aa72b/src/document/crdt/tree.ts#L835-L849

A similar issue has been addressed for Text in the code:
https://github.com/yorkie-team/yorkie-js-sdk/blob/main/src/document/crdt/rga_tree_split.ts#L935-L971

Why:

The point is that the external editor utilizing Yorkie doesn't want to receive unacceptable TreeChanges including inserting contents as child nodes of the deleted parent node and repeatedly deleting the nodes that are already deleted.

Here are some notable cases in Tree.Edit (insertion, deletion).

  1. delete & delete

Operations

<r><p>ab</p></r>
User A: edit(0, 4) => <r></r>
User B: edit(1, 2) => <r><p>b</p></r>

Expected TreeChanges generated before sync

User A: [edit(0,4)]
User B: [edit(1,2)]

Expected TreeChanges generated after sync

User A: None
User B: [edit(0,3)]
  1. delete & insert 1

Operations

<r><p>ab</p></r>
User A: edit(1, 3) => <r><p></p></r>
User B: edit(2, 2, "c") => <r><p>acb</p></r>

Expected TreeChanges generated before sync

User A: [edit(1,3)]
User B: [edit(2,2, "c")]

Expected TreeChanges generated after sync

User A: [edit(1, 1, "c")]
User B: [edit(1,2), edit(3,4)]
  1. delete & insert 2 (when the parent node is removed)

Operations

<r><p>ab</p></r>
User A: edit(0, 4) => <r></r>
User B: edit(2, 2, "c") => <r><p>acb</p></r>

Expected TreeChanges generated before sync

User A: [edit(0,4)]
User B: [edit(2,2, "c")]

Expected TreeChanges generated after sync

User A: None
User B: [edit(0, 5)]

As you can see at the second cases, separating a deletion range is required to avoid repetitive deletion and exclude alive nodes.
Unlike the Text CRDT, the Tree CRDT has levels. It means all nodes are not linearly connected with direct links.
Therefore, I guess finding the proper previous or next node of left-most/right-most child nodes is necessary.