sewenew/redis-plus-plus

[QUESTION] Application side actions when cluster changes

srgorti opened this issue · 8 comments

Before Asking A Question
This is regarding MOVED redirection in Redis cluster mode, when using pipelining. Per description here, Redis-plus-plus handles MOVED redirection. I have gone thru the description and am testing MOVED redirection by recycling nodes in the cluster. Had a question in this context.

Describe the problem
The behavior that I am observing is the following:

  1. Recycled one of the cluster nodes (on the Redis server side)
  2. Pipeline exec is receiving "Server closed connection" error.
  3. App caught the error and after noting the error, let the pipeline object be destroyed.
  4. At this point, app needs to retry by creating a new pipeline object.

The question is: should this new pipeline be created with new_connection = true (by default, our application is optimizing the connection overhead by setting new_connection = false when creating pipelines) ? Does the app need to take any other actions to ensure that no replies are lost during the retry? Another question that I have is how does one confirm that Redis-plus-plus received MOVED messages and did handle them successfully (other than by creating custom tests) ?

Also, in some test runs, observed that Redis-plus-plus detects error at the eval API, with a Connection timed out error. The above question about pipeline creation option applies here also.

Environment:
Using a supported environment

Additional context
n/a

thanks,

Thanks for your questions! It helps me to find a possible bug: If you only use pipeline with RedisCluster, i.e. no other command used, the underlying node-slot mapping might not be updated, even if it receives MOVE error. I need to take a deeper look into it to confirm whether it's a bug.

should this new pipeline be created with new_connection = true (by default, our application is optimizing the connection overhead by setting new_connection = false when creating pipelines) ?

So far, both methods are the same. It's has nothing to do with the MOVED message. With new_connection = false, the performance is better. However, if no other commands are sent to the cluster's broken node, the underlying node-slot mapping won't be updated, and it will fail again when you retry.

Does the app need to take any other actions to ensure that no replies are lost during the retry?

Because of the possible bug I mentioned above, you might need to send a command, e.g. GET, to the cluster with the same hash_tag as the pipeline you used. So that redis-plus-plus will update the mapping once the mapping is updated on server side.

Another question that I have is how does one confirm that Redis-plus-plus received MOVED messages and did handle them successfully

You can try the MONITOR command on Redis server side, to see which commands are sent to the server. However, it's a slow command, you'd better do the test on test env.

with a Connection timed out error.

That means the connection is timed out, you need to set a large ConnectionOption::socket_timeout.

Regards

It helps me to find a possible bug: If you only use pipeline with RedisCluster, i.e. no other command used, the underlying node-slot mapping might not be updated, even if it receives MOVE error.

Thanks for the response and providing guidance on possible limitation.

Because of the possible bug I mentioned above, you might need to send a command, e.g. GET, to the cluster with the same hash_tag as the pipeline you used. So that redis-plus-plus will update the mapping once the mapping is updated on server side.

If I understood correctly, the suggestion is the following:

  1. Upon detecting Server closed the app should just let the current pipeline be destroyed (I observed that pipeline.discard() fails as the pipeline has already been invalidated, so pipeline.discard() does not seem to be the right option).
  2. Probe node liveness with a GET command, till the command returns successful (which implies that Redis-plus-plus has learnt the modified cluster nodes).
  3. And then retry any of the previously pending commands.

Is that the recommendation ?

thanks,

Yes, you can follow the 3 steps you mentioned in the comments. Also you should always destroy the pipeline object once you get an exception.

Regards

Thanks for the confirmation. I am trying this out.

... you might need to send a command, e.g. GET, to the cluster with the same hash_tag as the pipeline you used.

Does it have to be only GET or would any other key-based command work ?

Thanks, quick test with the approach that you suggested has worked. (Had used EXISTS command instead of GET).

regards,

Yes, any key-based command should work.

Regards

The problem has been fixed. You don't need to manually call a key-based command.

If you still have problem with it, feel free to let me know.

Regards