Fix race between job completion and heartbeats
jonhoo opened this issue · 2 comments
Currently, there exists a race between the thread that performs heartbeats and the worker thread that completes jobs. Specifically, when the worker waits for the OK
following an ACK
or a FAIL
, it might receive the reply intended for the heartbeat, and vice-versa.
This is normally not an issue, since both heartbeats and workers generally receive OK
responses. However, it can misbehave when a non-OK response is given (e.g., an error or a QUIET
). There is also a write race here, where technically an outgoing message could be large enough that it ends up being flushed across multiple system calls, at which point a BEAT
and an ACK
/FAIL
could be mixed.
Fixing this will require a bit of engineering. The easiest is probably to spin up a third RPC thread that is responsible for sending messages and waiting for replies in a single atomic step.
Actually, it seems like I lied. That's not at all an easy way. In particular, it would mean that Client
only contains an mpsc::Sender
on which you send a FaktoryCommand
and another mpsc::Sender
on which the response (or error) will be sent. Seems well and good, until you get around to connect
and reconnect
, which now become all sorts of thorny.
Another (perhaps simpler) way to implement this would be to give Client
an Arc<Mutex<Read + Write>>
directly, which it would then lock, do write, do read, and unlock, to guarantee atomicity. It's not particularly pretty, but should work.
Actually this is a non-issue, because all the methods require &mut self
, so they already provide this exclusion. Consumer
places the entire Client
in a lock, which is what you'd want.