NVIDIA/stdexec

Provide a `tbb_sync_wait` that provides deadlock-safe reentrant calling.

Opened this issue · 0 comments

As I've discussed on a few other tickets, stdexec::sync_wait isn't safely reentrant in that stdexec::sync_wait(schedule(sch)) blocks the calling thread, and so can deadlock if it's the last free thread in sch. I don't have a general solution, but a specific solution is to implement a tbb_sync_wait that uses tbb::task_group g in place of stdexec::run_loop, where the delegation scheduler uses g.run(f). This allows the calling thread to potentially do all the work, only blocking while work is being done by a completely separate scheduler.

Here's a sketch of an implementation: https://godbolt.org/z/dWPzW86nG
That implementation lets you recursively call tbb_sync_wait as deeply nested as you like without deadlocking and parallelizes bulk using tbb::parallel_for.

Ideally we'd have a synchronously-cancelable scheduler which would allow us to get this sort of behavior more generally, letting more than one scheduler attack the same queue of work even in the case that one scheduler is completely occupied. But until then, this is the beast I can suggest. It

  1. Lets you call tbb_sync_wait without fear of deadlock.
  2. Lets simple parallel algorithms using bulk Just Work.