Questions about log probe
kikimo opened this issue · 3 comments
Look at the following piece of code, the test at line will make raft send at most maxMsgSize
bytes of entries when raft is in probe state, ie. pr.State != tracker.StateProbe
:
Lines 565 to 585 in 3e6cb62
I have two questions:
- Is this an expected behaviour(send at most
maxMsgSize
bytes of entries when raft is in probe state)? - If this is an expected behaviour, why don't we send just one or empty entry when
pr.State != tracker.StateProbe
? Since in a probe state it's very likely for a append message to be rejected, sending just one or zero entry might accelerate the probe process.
@ahrtr @pavelkalinnikov
- Is this an expected behaviour(send at most
maxMsgSize
bytes of entries when raft is in probe state)?
Yes. This behaviour is "correct" either way. But there are options to save some bandwidth, as you point out, depending on assumptions. I think the current strategy optimistically assumes that the first probe will succeed, and in this case we will save one roundtrip of latency.
I can think of 2 cases when this probing happens:
- There is a stale follower who went offline for a while, and since then a few leadership changes and log suffix overrides happened. In this case it is likely that the first append message in the probing state won't succeed, and there will be a few roundtrips before the appends stabilize.
- During a normal leadership run there was a network hiccup, and one append message got lost. Leader will eventually get a reject, but it will probably successfully recover the flow of appends with a single probing message.
- If this is an expected behaviour, why don't we send just one or empty entry when
pr.State != tracker.StateProbe
? Since in a probe state it's very likely for a append message to be rejected, sending just one or zero entry might accelerate the probe process.
What you're suggesting would be best for the case (1). The current strategy is better for case (2) in terms of replication latency. There is no obviously always-better option, it's a trade-off.
It's hard (but maybe not impossible) to distinguish between case (1) and (2) on the leader end either, to make this decision dynamic. So we stick to the optimistic, I guess. I don't have data though, to support the argument that this optimistic approach is best on average. I think it largely depends on the deployment.
Btw, see a related broader-scope issue #64 - this and similar user/workload/deployment-dependent flow control aspects could be delegated to the upper layer, and not necessarily hardcoded in raft
package.
Thanks for your reply, close this issue.