y-crdt/ypy

Async API

Opened this issue · 7 comments

Since CRDTs can be CPU-intensive, I'm wondering if we could run all the Rust code in a separate thread, and have an async Python API that would not block while waiting for the CRDT operations to complete.
For instance:

async with doc.begin_transaction() as txn:
    await ytext.extend(txn, "foo")
    ...

delta = await encode_state_as_update(doc)
await apply_update(other_doc, delta)

I know that Ypy doesn't support multi-threading, but here all the Rust code would run in the same thread (but this would not be the main Python thread).
I think this would be a nice performance gain on multi-core CPUs. On single-core CPUs the non-async API would be a better choice.

Cannot be the same done via Python threads? Just put py doc in background thread and submit requests via queue.

Except that Ypy holds the GIL, which prevents the thread from running in parallel.
For instance, the following code shows no more than 100% CPU:

from threading import Thread
import y_py as Y

def main():
    doc = Y.YDoc()
    text = doc.get_text("text")

    while True:
        with doc.begin_transaction() as txn:
            text.extend(txn, "foo")

        with doc.begin_transaction() as txn:
            text.delete_range(txn, 0, 3)

t = Thread(target=main)
t.start()

while True:
    pass

Btw. I was thinking, that this would be a pain in the ass if we were to implement it on every method available. However if we would limit ourselves to a subset of operations ie. sync messages passed over the network and register for changes, then we could create a something like an Archive storing documents and performing operations on them directly using Rust thread pool.

The purpose of such archive is to serve on the server side as a hub, with its own multi-threaded document dispatch and update broadcast. It could even implement something like LRU cache and potentially load docs from disk on demand when they are touched and unload the least frequently used - releasing memory - when the resources are getting thin.

It could be implemented as a feature flag in y-sync crate and pulled from there. This crate already provides an utility methods for managing update broadcasts.

For now I'd just like to achieve parallelism in Rust-only code, since Yrs should allow it, but I'm facing issues. I opened https://github.com/davidbrochart/pycrdt/pull/6 to illustrate the problem, it would be great if you could look at it.

@davidbrochart It would be terrific to get to know whether there has been any progress on that. 🙂

I'm working on pycrdt now. I'm thinking about an async API but not for performances, just to provide better integration with async frameworks.

Regarding issue mentioned in pycrdt repo - I'll be fixing this part: we already talked about it with Lucio and Sebastian last week. Core issue is that y-types are using wrappers around raw pointers underneath, which Rust considers unsafe in many contexts. I'm going to replace them with atomic ref-counted pointers which don't have this issue.