jlongster/crdt-example-app

For what is operator `| 0`?

Opened this issue · 6 comments

volyx commented

let key = Number((timestamp.millis() / 1000 / 60) | 0).toString(3);

This is a quick way of converting the floating-point value to an integer (the bitwise operators only work on 32-bit integers, so it causes the 64-bit float to be converted to an integer).

In this case, it ensures the base-3 encoded timestamp strings end up looking like "1211121022121110" instead of "1211121022121110.11221000121012222".

If it's helpful to anyone, I forked James' repo and have an "annotated" version that includes a bunch of code comments for stuff like this (and a NOTES.md)--basically study notes. For example: https://github.com/clintharris/crdt-example-app_annotated/blob/master/shared/merkle.js#L43

volyx commented

@clintharris Thank you!

Do you know the reason why does merkle tree use 3 nodes instead of 2?

https://github.com/clintharris/crdt-example-app_annotated/blob/master/shared/merkle.js#L36

The base-3 number system used for timestamps means the tree will be ternary.

In other words: each char in the timestamp string can only be "0", "1", or "2". So each node in the tree can have 0..3 child nodes accessible via those keys.

volyx commented

the tree will be ternary.

Yes, sure. My question is - why the tree is not a binary tree like on many merkle tree pictures?
image

https://en.wikipedia.org/wiki/Merkle_tree

volyx commented

Hi! As I see at

let key = Number((timestamp.millis() / 1000 / 60) | 0).toString(3);
Merkle path has precision in minutes, am I right?

If so, several edits that occur during one minute will have the same path in Merkle tree - 012 and 012 for example. How does a sync engine deal with a lot of edits during one minute? 🤔

I’m looking at this for the first time, so apologies if my interpretation is incorrect, but here’s how I see it working.

The Merkle tree isn’t storing the actual values that are set, only a hash. When inserting a value into the tree, the insertion process walks down the tree to the leaf node for the timestamp and XORs each existing node’s hash with the new hash along the way. This way, two trees can be compared quickly by finding the node with the earliest timestamp where the hashes differ.

So the tree doesn’t tell you exactly what is different, but it does give you a lower bound on which messages need to be sent to completely update a client with the latest state.

As for why a ternary tree instead of a binary tree, it’s probably just to reduce the depth of the tree to make traversal faster. I actually wonder if a higher base might work even better but I haven’t thought through it enough yet and it probably doesn’t matter much.

EDIT: It also simplifies traversal of the tree and somewhat reduces rebalancing concerns which could otherwise become an issue with monotonically increasing keys, since the "trie" representation means that the key is embedded in the tree structure itself. In this case, the tree starts out very unbalanced but becomes more balanced over time.