Optimise Rope performance for small edits
Opened this issue · 3 comments
Our Rope
currently offers good performance for inserting or deleting large amounts of text from random positions in the Rope
. Because Rope
indexing is faster than String
indexing, and we don't need to push the whole rest of the text out of the way on insertions, we can do these kinds of large edits faster than on a String.
This is great, except for the elephant in the room: large edits at random indices is almost certainly not the most common case for a text editor.
While large insertions or deletions at unknown indices may occur (e.g. as a result of commands like copy/paste, find and replace, or selecting and deleting a block of text), it's most likely that text is going to be inserted one character at a time, at sequential indices.
Suppose the cursor is at position i. The editor detects a keypress for the character c and inserts it at position i. Then, the user presses the key a, which is inserted at i + 1. The next keystroke is also a character, t, which is inserted at i + 2, and so on.
Naïvely, our Rope
would handle this kind of editing by indexing to position i and inserting a c
, indexing to position i + 1 and inserting an a
, and then indexing to position i + 3 and inserting a t
. Even though these insertions are probably faster than the corresponding String
insertion operations, we still spend a lot of time indexing. Also, we will check to rebalance the rope on every insertion. Since the cursor follows a predictable pattern of behaviour, we can track its position to avoid repeatedly indexing the Rope
.
Furthermore, the way our edit history stack is currently implemented complicates this issue somewhat. If we assume that every keypress corresponds to an edit()
on the stack, then we end up cloning an Rc
for every single character insertion, and the edit history stack will quickly become very, very long. This is...probably not ideal. Also, it means that the undo command would be granular at the level of undoing a single character, which is not the behaviour I've seen in most text editors.
My proposed solution is to modify our Rope
with the addition of a new type of leaf node containing a gap buffer rather than a String
. This allows fast insertions at a cursor position.
We then create a rope Cursor
structure (#40) tracking the current cursor position in the Rope
. When the cursor enters a given node, we transform it into a gap buffer, which we edit in-place. After the cursor leaves a node (or possibly after a set number of edits or amount of time with no key presses?), we commit the edits into a new Rope
(sharing all of the un-edited nodes with the previous version) which goes on the undo stack. We can also defer rebalancing the rope until we commit a set of edits, so that we still get the performance benefits of keeping the rope balanced, but we don't have to check balance every time a single character is added or deleted.
All the gap buffer stuff would only happen when editing the Rope
using a Cursor
, and we could probably implement it transparently to downstream consumers of the an-rope
library, without changing any existing API.
@cjm00, I'm tagging you on this 'cause I'd like to know if it seems reasonable to you before proceeding. If anyone else (@croyzor, @twisted-pear, @rachlmac...) has input, I'd love to hear it as well.