mediocregopher/radix

Make use of the address info in MOVED error

Closed this issue · 2 comments

During slot migration in a large cluster (or due to a failover), when the slot ownership changes, not every node in the cluster will learn about this change instantly - especially when the cluster has 10s-100s of nodes or when the other nodes are busy running commands. So, when we get a MOVED error, it is better to

  1. Send a topology refresh request to the node that responded with the MOVED error as it has learnt about the topology change.
  2. Resend the request to the address in the MOVED response.

Motivation: This can prevent repeated MOVED errors during slot migration or failover.

I see that other clients make use of the address in the MOVED response

But radix discards the address in the MOVED error and refreshes topology by contacting a random node

if serr := c.Sync(); serr != nil {

Is this something that can be improved?

Hi @srgsanky , thanks for submitting the issue. I do think this is something which is worth improving, but I'm not sure if I will have the time to do it myself. If you wanted to submit a PR I'd be happy to review and merge it. It should be pretty simple change, basically just taking what Sync() does internally and inlining it into doInner, with the call to pool() accepting an address instead of "".

I came to know that radix actually does (2) i.e. resend the request to the address in the MOVED response. I missed reading the rest of the code in

radix/cluster.go

Lines 742 to 753 in 3d8bf81

msgParts := strings.Split(msg, " ")
if len(msgParts) < 3 {
return fmt.Errorf("malformed MOVED/ASK error %q", msg)
}
ogAddr, addr := addr, msgParts[2]
c.traceRedirected(ogAddr, key, moved, ask, doAttempts-attempts+1, attempts <= 1)
if attempts--; attempts <= 0 {
return errors.New("cluster action redirected too many times")
}
return c.doInner(a, addr, key, ask, attempts)
which does this 🤦

I will close this issue. Thanks for your support!