Notification handlers can deadlock with the dispatch loop
creachadair opened this issue · 0 comments
This issue tracks the bug reported by @appilon in #26.
In hashicorp/terraform-ls#258 they observed a deadlock in the server, which was traced to a notification handler attempting to cancel another request in-flight.
Hypothesis: If a handler responding to a notification attempts to cancel another request, it could deadlock with the next batch waiting for that same notification to complete (during which time it holds the server lock).
I was able to build a repro for this hypothesis. Specifically, here's the problematic sequence:
- A notification (N) arrives and is dispatched to its handler.
- While N is busy doing other work, the dispatcher locks to wait for notifications to clear.
- N invokes
jrpc2.CancelRequest
.
Step (3) attempts to acquire the server lock, and deadlocks with the dispatcher. The key factor is a notification handler that attempts to cancel other requests in flight. This problem was made possible by #24. The solution is for the dispatcher to yield the lock while it waits for previous notifications to settle.