Multithreaded Environment Support
Closed this issue · 8 comments
Problem
My team and I were hoping to use py-crdt in a server-side context with multiple workers. We ran into the issue of workers switching context and the rust binary panicking when resources were dropped by the original threads, complaining that the structs are "unsendable".
I skimmed through the source code and found the Rust definitions for the pycrdt structs that were causing these issues, and noticed that they were simple wrappers around yrs structs which do implement Send and Sync, but that the wrappers around them are explicitly marked as unsendable, which I thought might be related.
I did also see in related threads in other repositories that an async API system was being considered, and wanted to ask if there were any updates on that front, and if there was some other considerations I am unaware of that required structs to be unsendable? And if there was any proposed solution to this issue that is in the works?
Any information anyone could provide would be very helpful, thanks!
Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! 🤗
If you haven't done so already, check out Jupyter's Code of Conduct. Also, please try to follow the issue template as it helps other other community members to contribute more effectively.
You can meet the other Jovyans by joining our Discourse forum. There is also an intro thread there where you can stop by and say Hi! 👋
Welcome to the Jupyter community! 🎉
There is a comment pointing to what should be done to allow multithreading, but I haven't gone any further.
I'm still interested though, and it would be great if you wanted to open a PR.
I opened #114.
Thank you for the prompt response! I took a look at the PR, and tried cloning + building but taking a closer look it looks as though some of the underlying yrs structs don't implement send and at some point we may need some Mutexing to make this thread-safe?
Performance issues aside, would wrapping each of the underlying yrs structs in the Python interface layer in an Arc<Mutex<T>>
solve the issue? If so, is this something that would be helpful to implement and then provide some library interface that allows user to decide if they'd like to use a mutexed version of the library?
Good questions, I honestly don't know. Should we start by just having it work, using mutexes?
I think a Mutex implementation would be very nice to have, so long as it's kept as an optional configuration - applications that are single-threaded shouldn't incur the overhead of locking, etc.
Would this configuration consist of having two separate builds, one with mutexes and one without, and a runtime parameter for choosing one?
Note that even when not sending objects across threads, just creating an object in another thread is problematic since it can be garbage collected from a different thread, which will panic and lead to memory leaks. We can see that in pytest
for instance, see here.
So it might be safer to always use mutexes.