DuckLogic/zerogc

Thread Safety

Techcable opened this issue · 2 comments

Right now the simple collector is "thread-safe" in that everything is !Send. It's not possible to send garbage collected pointers across threads, so no synchronization is ever needed.

As it stands, the API is pretty agnostic to whether or not the underlying implementation is thread-safe. It exects thread safety to be marked by the appropriate Send and Sync implementations.

The main issue with allowing multiple threads to use the collector is that a collection would have to pause all threads. Once we decide its time to collect, we'd have to block until all other collectors reach a `safepoint!'. This is done cooperatively (like all of zerogc's safepoints), meaning that a single thread could hold up collection by waiting too long to invoke a safepoint.

Internally other VMs, they just don't have an issue with uncooperative threads, since the VM automatically inserts safepoints. The JVM's JIT inserts a safepoint on every single loop iteration (implemented as a test against a guard page).

This is almost analogous to the tradeoff between cooperative and preemptive multitasking. ZeroGC is designed as a low-cost abstraction. We already rely on the user to insert regular safepoints in the single-threaded case, so it's reasonable to expect them to be responsible in multithreaded case as well.

Currently, ZeroGC safepoints are expected to return very quickly if there's not a collection (usually just checking memory usage).
The ZeroGC API should be updated to allow a user to "freeze" a GcContext at a safepoint. This would prevent future allocations (until unfrozen), ensuring that the context's set of roots is known. This is essentially a long-term safepoint, that would other threads to collect while the first set is still working.

The simple collector will probably need a little bit of internal synchronization for this (we'll add a feature-flag to control threading support). We'll need to support thread-local caches for the small-object-arenas. A good experiment would be attempting to reimplement the binary-trees example using multiple threads.

This is basically a requirement for DuckLogic. Although Python is essentially single-threaded (due to the GIL), it's still technically possible for python code to be invoked from different threads.

Our current implementation seems to result in a performance regression for single-threaded code.

For the single-threaded binary_trees (n=21):
As of 7741065 we have 34 s + 385 MB
As of 0c190e5 we have 47 s + 385 MB

Thread safety is incomplete and the current implementation deadlocks. I have put no effort into optimization yet, and I bet there's probably plenty of low-hanging fruit.

It turns out that using roots with frozen collectors was unsound. I suspect this is why the parallel implementation was crashing with optimizations. See 9ad529d for details.

As of dc30ec0 we should be thread-safe, aside from the frozen roots (which I removed next commit).

I'll have to implement an Object Handle interface as a replacement (#9). This was needed for DuckLogic anyways.