NVIDIA/stdexec

replace the lock in `exec::__nest_rcvr::__complete` with a CAS loop

ericniebler opened this issue · 0 comments

async_scope::spawn is much slower than start_detached, and I suspect the issue is the locking and unlocking going on in exec::__next_rcvr::__complete. There is similar logic for walking a linked list and notifying each element in the implementation of split, but there it is done more efficiently with a compare-and-swap loop. Maybe the two can share logic.