Stuck requests when receiveSignal is released after Dispose()
Closed this issue · 5 comments
Consider following stack trace:
Sally7.Sally7Exception: Couldn't signal receive done.
at Sally7.Sally7Exception.ThrowFailedToSignalReceiveDone()
at Sally7.RequestExecutor.ConcurrentRequestExecutor.PerformRequest(ReadOnlyMemory`1 request, Memory`1 response, CancellationToken cancellationToken)
at Sally7.S7Connection.ReadAsync(IDataItem[] dataItems, CancellationToken cancellationToken)
at VLC.PLC.Sally7.Sally7PlcConnection.ReadIntermediateAsync(IEnumerable`1 readContainers)
at VLC.PLC.PlcConnection.ReadAsync(IEnumerable`1 readContainers, Boolean triggerChanges)
at VLC.PLC.Polling.PlcPoller.TimerCallback(Object state)
The following must have happened:
- Request has entered
PerformRequest
- Request was sent
- Response was received
ConcurrentRequestExecutor
is disposedreceiveSignal.TryRelease()
returnsfalse
Sally7Exception.ThrowFailedToSignalReceiveDone()
is invoked
If there was only one request, this results in the caller receiving the exception from point 6.
Now assume there was already a caller waiting:
- Request A was sent
- Request B was sent
- Caller A receives response B
rec.Complete
is called forRequest
B- Caller A awaits
Request
A (no cancellation) - Caller B receives response A
- Caller B exceptions due to
receiveSignal.Dispose()
- Caller B receives the exception
Request
A is never completed- Caller A is stuck
Possible solution:
- On
JobPool.Dispose()
Dispose()
allRequest
s
a.Request.Dispose() { _disposed = true; _continuation.Invoke(); }
b.Request.GetResult() { ObjectDisposedException.ThrowIf(_disposed, this); return ... }
- Add
ObjectDisposedException.ThrowIf
toConcurrentRequestExecutor
throw paths?
Disadvantages?
@gfoidl Would you care to reflect on the possible solution? I think I might as well need to flush the JobPool
on Dispose()
(do Reader.TryRead()
until it returns false
) and perhaps check for disposed/throw in RentJobIdAsync
as well.
The most annoying issue here is that even though there's a default request timeout of 5s in read/write, that's implemented using CancellationToken
and the final await jobPool.GetRequest(jobId)
isn't affected by the cancellation, so I'm actually hitting this bug in production.
Would it help to remove Dispose
, but introduce DisposeAsync
where the async dispose can await outstanding request (w/ timeout).
I think the synchronous dispose makes a lot of troubles (as we see through various issues) -- and it gets sync-over-async here.
My thinking is that async disposal could get rid of these troubles.
For the async disposal when a timout is hit, then proceed as you suggested above (and drain the JobPool
as well).
BTW: are you able to produce a minimal repro of the problem or is it timing-relavant (race condition)?
Would it help to remove
Dispose
, but introduceDisposeAsync
where the async dispose can await outstanding request (w/ timeout).
I think this will move complexity, not sure if it really helps though. For one it would complicate my use of the library, because right now I don't have an async path to call DisposeAsync
.
I think the synchronous dispose makes a lot of troubles (as we see through various issues) -- and it gets sync-over-async here. My thinking is that async disposal could get rid of these troubles.
I actually think it's fine. Async is nothing more than postponed Action
s as far as we're concerned here, which means I can just invoke the postponed Action
s on Dispose
as suggested above.
For the async disposal when a timout is hit, then proceed as you suggested above (and drain the
JobPool
as well).
As you say, this would still require the sync Dispose
after the timeout. That does mean we could implement both though, the DisposeAsync()
with timeout and Dispose()
that just kills any requests right away. I don't really care about waiting, I'm mostly only closing the connections on errors anyway, as was the case when this bug showed up (but apparently there was at least data in the buffer to process a single request).
BTW: are you able to produce a minimal repro of the problem or is it timing-relavant (race condition)?
In context to ConcurrentRequestExecutor
it's timing relevant, but a test around JobPool
could expose the underlying problem.
because right now I don't have an async path
Ah, then that doesn't make sense to have DisposeAsync.
The sync Dispose I had in mind to just use internally, and don't expose it.
Then your suggestions from above seem fine.
Thanks for reviewing! I think it would be fine to expose both Dispose
and DisposeAsync
, the latter needs some thought to implement though. I guess I'll be working on this tonight.