Enqueue doesn't interrupt event loop run (race condition)
Closed this issue · 1 comments
I was fixing MT codegen in Crystal (see #14748) and noticed it had a pitfall (starving threads because of blocked fiber) that could be solved by using execution contexts.
I thus went to try it out, and I quickly identified an issue in the execution contexts: the process eventually comes to a halt 🤨
The DEFAULT thread (ST) whose main thread is sending to a channel is waiting on the event loop, while the CODEGEN threads wait to read on an empty channel (with no pending sender). A debug sessions showed that the main fiber was properly enqueued in the global queue of the DEFAULT thread, but the thread is still waiting on the event loop.
That MUSN'T happen. There's a race condition that fails to interrupt or fails to prevent a thread from blocking on the event loop.
Now that I know, it's obvious that there are a bunch of races between stop spinning & start blocking:
Scenario A:
- thread A: checks global queue (empty)
- thread B: enqueues Fiber to global queue
- thread B: checks for spinning threads, finds one => skips wakeup
- thread A: stops spinning
- thread A: starts blocking
- thread A: waits on the EventLoop (oops)
Scenario B:
- thread A: checks global queue (empty)
- thread B: enqueues Fiber to global queue
- thread A: stops spinning
- thread B: checks for spinning threads: none
- thread B: checks for blocked threads: none => skips wakeup
- thread A: starts blocking
- thread A: waits on the EventLoop (oops)
It's so obvious that it's embarrassing 😩