creachadair/jrpc2

Notification handlers can deadlock with the dispatch loop

creachadair opened this issue · 0 comments

This issue tracks the bug reported by @appilon in #26.

In hashicorp/terraform-ls#258 they observed a deadlock in the server, which was traced to a notification handler attempting to cancel another request in-flight.

Hypothesis: If a handler responding to a notification attempts to cancel another request, it could deadlock with the next batch waiting for that same notification to complete (during which time it holds the server lock).

I was able to build a repro for this hypothesis. Specifically, here's the problematic sequence:

  1. A notification (N) arrives and is dispatched to its handler.
  2. While N is busy doing other work, the dispatcher locks to wait for notifications to clear.
  3. N invokes jrpc2.CancelRequest.

Step (3) attempts to acquire the server lock, and deadlocks with the dispatcher. The key factor is a notification handler that attempts to cancel other requests in flight. This problem was made possible by #24. The solution is for the dispatcher to yield the lock while it waits for previous notifications to settle.