Isolated Cassandra node results in `{:cluster, :not_connected}`
Closed this issue · 5 comments
I have the issue that I am getting {:cluster, :not_connected}
when one of 3 cassandra nodes is isolated.
Current setup:
3 cassandra nodes, one gets isolated (simulated via iptables DROP of port 7000)
iex(1)> {:ok, pid} = Xandra.Cluster.start_link(authentication: {Xandra.Authenticator.Password, username: "<user>", password: "<pw>"}, nodes: ["ip1:9042", "ip2:9042", "ip3:9042"], pool_size: 10)
{:ok, #PID<0.877.0>}
iex(2)> :sys.get_state(pid)
%Xandra.Cluster{
autodiscovered_nodes_port: 9042,
autodiscovery: true,
load_balancing: :random,
node_refs: [
{#Reference<0.2492756426.2133852163.51413>, {ip1}},
{#Reference<0.2492756426.2133852163.51415>, {ip2}},
{#Reference<0.2492756426.2133852163.51417>, {ip3}}
],
options: [
protocol_module: Xandra.Protocol.V3,
idle_interval: 30000,
protocol_version: :v3,
authentication: {Xandra.Authenticator.Password,
[username: "<user>", password: "<pw>"]},
pool_size: 10
],
pool_supervisor: #PID<0.878.0>,
pools: %{
{ip1} => #PID<0.885.0>,
{ip2} => #PID<0.909.0>,
{ip3} => #PID<0.897.0>
}
}
Now simulate node isolation on one cassandra node:
$ iptables -I INPUT -p tcp --dport 7000 -j DROP; iptables -I OUTPUT -p tcp --dport 7000 -j DROP;
After some seconds:
iex(3)> :sys.get_state(pid)
%Xandra.Cluster{
autodiscovered_nodes_port: 9042,
autodiscovery: true,
load_balancing: :random,
node_refs: [
{#Reference<0.2492756426.2133852163.51413>, {ip1}},
{#Reference<0.2492756426.2133852163.51415>, {ip2}},
{#Reference<0.2492756426.2133852163.51417>, {ip3}}
],
options: [
protocol_module: Xandra.Protocol.V3,
idle_interval: 30000,
protocol_version: :v3,
authentication: {Xandra.Authenticator.Password,
[username: "<user>", password: "<pw>"]},
pool_size: 10
],
pool_supervisor: #PID<0.878.0>,
pools: %{}
}
See here the state if the Xandra.Cluster pools: %{}
which results in {:cluster, :not_connected}
I also tried the 0.14.0 but except from this #262 issue, it's still broken.
I also tried the current master and there it works.
When do you plan a new release?
I debugged this a bit deeper and found more details:
Let's say we have 3 cassandra nodes.
The driver (Xandra.Cluster) opens to every node a control connection.
Now we 'isolate' node1 via iptables --dport 7000 -j DROP
.
All 3 control connection's are still working because we are only blocking port 7000 and they are continue reporting cluster events.
node2
and node3
reports StatusChanged{reason: "DOWN", node: "node1"}
Which is correct, from their point of view.
BUT!:
node1
reports StatusChanged{reason: "DOWN", node: "node2"}
and StatusChanged{reason: "DOWN", node: "node3"}
Which is also correct, from their point of view.
So the driver things all nodes are down. Which results now in {:cluster, :not_connected}
@franke1276 thanks for the report! Does this happen on the main
branch too?
@franke1276 thanks for the report! Does this happen on the
main
branch too?
Yes, it also happen there.
@franke1276 I've done some changes to the main
branch. Now, we only open a single control connection to one of the nodes in the cluster. In your case, for example, if we open the connection to node1
, then we'll indeed see node2
and node3
as down. In my opinion, that's probably the correct behavior: we're trusting what the cluster says.
Could you give the new main
a try and see what happens?
I’m closing this one since I don't believe it's valid anymore after the big round of changes that happened in the last couple of weeks. If this is still an issue, we can open a new issue and look into it!
Thanks for the original report @franke1276 💟