Optimise Rope performance for small edits

Question

Optimise Rope performance for small edits

Opened this issue 7 years ago · 3 comments

Our Rope currently offers good performance for inserting or deleting large amounts of text from random positions in the Rope. Because Rope indexing is faster than String indexing, and we don't need to push the whole rest of the text out of the way on insertions, we can do these kinds of large edits faster than on a String.

This is great, except for the elephant in the room: large edits at random indices is almost certainly not the most common case for a text editor.

While large insertions or deletions at unknown indices may occur (e.g. as a result of commands like copy/paste, find and replace, or selecting and deleting a block of text), it's most likely that text is going to be inserted one character at a time, at sequential indices.

Suppose the cursor is at position i. The editor detects a keypress for the character c and inserts it at position i. Then, the user presses the key a, which is inserted at i + 1. The next keystroke is also a character, t, which is inserted at i + 2, and so on.

Naïvely, our Rope would handle this kind of editing by indexing to position i and inserting a c, indexing to position i + 1 and inserting an a, and then indexing to position i + 3 and inserting a t. Even though these insertions are probably faster than the corresponding String insertion operations, we still spend a lot of time indexing. Also, we will check to rebalance the rope on every insertion. Since the cursor follows a predictable pattern of behaviour, we can track its position to avoid repeatedly indexing the Rope.

Answer 1 · 2017-03-03T18:47:45.000Z

Furthermore, the way our edit history stack is currently implemented complicates this issue somewhat. If we assume that every keypress corresponds to an edit() on the stack, then we end up cloning an Rc for every single character insertion, and the edit history stack will quickly become very, very long. This is...probably not ideal. Also, it means that the undo command would be granular at the level of undoing a single character, which is not the behaviour I've seen in most text editors.

Answer 2 · 2017-03-03T19:00:17.000Z

My proposed solution is to modify our Rope with the addition of a new type of leaf node containing a gap buffer rather than a String. This allows fast insertions at a cursor position.

We then create a rope Cursor structure (#40) tracking the current cursor position in the Rope. When the cursor enters a given node, we transform it into a gap buffer, which we edit in-place. After the cursor leaves a node (or possibly after a set number of edits or amount of time with no key presses?), we commit the edits into a new Rope (sharing all of the un-edited nodes with the previous version) which goes on the undo stack. We can also defer rebalancing the rope until we commit a set of edits, so that we still get the performance benefits of keeping the rope balanced, but we don't have to check balance every time a single character is added or deleted.

All the gap buffer stuff would only happen when editing the Rope using a Cursor, and we could probably implement it transparently to downstream consumers of the an-rope library, without changing any existing API.

Answer 3 · 2017-03-03T19:01:11.000Z

@cjm00, I'm tagging you on this 'cause I'd like to know if it seems reasonable to you before proceeding. If anyone else (@croyzor, @twisted-pear, @rachlmac...) has input, I'd love to hear it as well.