valkey-io/valkey-glide

Rust core can't connect to a replica

Closed this issue · 4 comments

Describe the bug

python3 utils/cluster_manager.py start -p 6380 6381

starts 2 nodes (not a cluster!)
master/replica assignment occurs automatically by the servers (they communicate to each other).
Then, when a connection request sent thru UDS with only one server address and this server is a replica, rust core lib fails to connect.

Expected Behavior

Connection should succeed

Current Behavior

2024-01-22T23:37:26.982230Z DEBUG logger_core: connection - new socket listener initiated
2024-01-22T23:37:27.527979Z  INFO logger_core: Connection configuration -
Addresses: localhost:6380
TLS mode: No TLS
Standalone mode
Read from Replica mode: Only primary
Protocol: RESP3
2024-01-22T23:37:27.536844Z DEBUG logger_core: connection creation - Attempting connection to host: "localhost" port: 6380
2024-01-22T23:37:27.540984Z DEBUG logger_core: connection creation - Connection to localhost:6380 created
2024-01-22T23:37:27.541108Z ERROR logger_core: ClientCreationError - ConnectionError - ConnectionError(Standalone(Received errors:
))
2024-01-22T23:37:27.541126Z ERROR logger_core: client creation - Connection error: Standalone(Received errors:
)

Reproduction Steps

python3 utils/cluster_manager.py start -p 6380 6381

Then

var regularClient =
	RedisClient.CreateClient(
		RedisClientConfiguration.builder()
			.address(NodeAddress.builder().port(6380).build())
			.build())
	.get(10, TimeUnit.SECONDS);

Note: this is flakey, because master/replica election is a random process and on one run port 6380 may be occupied by master, on another test run it could be used but the replica

See different responses to HELLO message in tcpdump/wireshark network dump:
on failure
image
on success
image
tcpdump.zip

Possible Solution

Workarounds

  1. After fix for #848 start server with only one node and use it
  2. Pass both ports to rust core lib on connection request

Additional Information/Context

No response

Client version used

N/A

Redis Version

6.0.16

OS

Linux

Language

Python

Language Version

N/A

Cluster information

No response

Logs

No response

Other information

No response

This is an intentional behavior - we don't want to leave the client in a state that doesn't allow the user to use some of its functionality.

A user may intetionally want to connect to a replica node to get/update node configuration or stats or whatever. Why not?

Why not?

we don't want to leave the client in a state that doesn't allow the user to use some of its functionality.

This is why not. The user might not be aware that they're connecting to a replica, or a more complex scenario - the user may try to connect to several nodes, some will fail and only some replicas will succeed. The user then will have a client that is unable to perform actions, without being aware of it.

That is what connection response for. It may contain something more verbose like:

  • connected to all nodes
  • connected to several nodes
  • connected to replica only
  • etc