Benchmarking RON-based chronofold implementation
gritzko opened this issue ยท 7 comments
Hi!
Martin @ept pointed me at this benchmark. Thanks, Martin!
Currently I have a rather unpolished RON+Chronofold C++ implementation which does text versioning/synchronization. It works in a slightly different model than Y.js or Automerge (see the draft at https://arxiv.org/abs/2002.09511). In particular, appending a char to a text would take nanoseconds, there is no point in comparing that case.
I thought, this microbenchmark should be close to the worst case for the C++ code:
[B1.4] Insert N characters at random positions (time)
That is because inserting-at-the-index is an expensive operation for RON+CF. It does not keep the text as a string, it has to recalculate such things. Also, inserting single characters at random positions breaks all chronofold optimizations. To make things comparable, I used the LineBasedText wrapper that uses {line,col} addressing.
Long story short, the first run of the bench said it spends ~3800ns per iteration (I use google/benchmark). That is a disappointing number... It is roughly ~6000*4usec for N=6000 iterations or 24ms in the "[B1.4]...(time)" line of the table.
Although, it is not directly comparable as it does not rebuild the entire text on each edit; only the affected line is read back. (That is the key idea of the LBT wrapper, it is made for text editors.)
I think I can measure the size of the resulting frame too...
Hi @gritzko,
thanks for sharing your benchmark results!
I will insert a link to this thread into the readme, so please continue sharing your results. I'd be pretty interested in the document size of the RON-encoded format and the time to rebuild the document from the RON-encoded format.
Do you think it makes sense to add Swarm to this benchmark, as it is based on RON?
I think it is a fair assumption for text types that lines contain only few characters. We could add another benchmark for this more realistic case. E.g. "Insert N characters at random positions in ๊N/100หฉ existing lines".
...the size of the resulting frame in RONt is about 100KB. That space is mostly taken by ids. An empty replica named "test" produces ~25byte/op updates in "[B1.4] Insert N characters at random positions (avgUpdateSize)". Overall, ~100KB in "[B1.4] Insert N characters at random positions (docSize)".
Again, this is close to the worst-case as random insertions break optimizations in the format.
Actually, the absolute worst case in RONt is 60bytes per a single-character insertion op.
Like, if we max out every field: numbers, replica ids, non-BMP Unicode chars.
Do you think it makes sense to add Swarm to this benchmark, as it is based on RON?
I don't think so.
gritzko/swarm is no longer supported
Hey @gritzko,
would you mind sharing the Chronofold results for applying the real-world dataset that @ept shared? I'd like to create a separate section that compares different implementations based on a real-world dataset as a baseline.
You can find the dataset here https://github.com/dmonad/crdt-benchmarks/blob/master/benchmarks/b4-editing-trace.js (it is copied with permission from the original repository).
I'm especially interested in the size of the final document docSize
, and the time to parse the encoded document parseTime
.
If possible, please share some insights on how much memory your implementation uses.
Sorry about that. Here is the correct one: https://github.com/dmonad/crdt-benchmarks/blob/main/benchmarks/b4-editing-trace.js