Storing CRDTs in the database

Question

Storing CRDTs in the database

Opened this issue 3 months ago · 1 comments

Idea of storing CRDTs into triple store has came up, specifically what if we just stored those as blob values ?

Answer 1 · 2024-10-03T17:41:13.000Z

Here is my rough thoughts on the matter

I think we could absolutely store arbitrary stuff as serialized blobs, I expect we'll need to regardless. It's not unreasonable to do it for an automerge document, however there are several nuances to consider:

Every change will be a conflict as in two concurrent updates will produce two CRDT docs that are different so retracting old and writing new is not going to pan out.
We could potentially do compare and swap (CAS) operations that reject assertion if retraction is outdated. This is going to prevent unintentional overwrites, and not really in eventual consistency settings, but probably better than not have it.
Yet another alternative could be to for each actor to write own op-log and merge at the edge, but I'm not sure if that would fit right in or some of the storage will need to be altered (If I recall correctly automerge's binary format lumped everyones changes together).
We could also potentially support custom merge system e.g. if we want automerge we could express somehow that assertion is not compare & swap but rather merge. This is probably most promising out of above options
- ⚠️ however it would likely affect sync in not so pleasant way.
This does raise some interesting questions around indexing and what one would except there ? My general suggestion is treat triple store as an index, that is index data locally in terms of derived triples and transact those. Storing actual data separately (e.g. in S3 like thing) is probably a better anyway. That would provide faster (index) sync and on-demand reads from things that cache well.
- However index locally is probably not going to pan out as local replicas diverge and will end up with conflicting indexes.
- Perhaps this is where deriving index at DB would make more sense
Finally I think CRDTs natively with Datalog seems a lot more promising, but admittedly more complicated.