Supervisor of `Xandra.Cluster` receives unexpected messages
Closed this issue · 8 comments
When starting Xandra.Cluster with a supervisor, the supervisor receives bunch of unexpected messages:
Starting Cassandra connection pool with: [name: :"CassandraConnection", keyspace: :test, username: "cassandra", password: "REDACTED", nodes: ["localhost"], pool_size: 10]
2023-08-16 14:00:10.239 module=supervisor pid=<0.629.0> [error]: Supervisor received unexpected message: {#Reference<0.390389104.540540930.64231>, {:ok, #PID<0.635.0>}}
2023-08-16 14:00:10.250 module=supervisor pid=<0.629.0> [error]: Supervisor received unexpected message: {#Reference<0.390389104.540540930.64231>, {:ok, #PID<0.635.0>}}
2023-08-16 14:00:10.250 module=supervisor pid=<0.629.0> [error]: Supervisor received unexpected message: {#Reference<0.390389104.540540930.64231>, {:ok, #PID<0.635.0>}}
2023-08-16 14:00:10.250 module=supervisor pid=<0.629.0> [error]: Supervisor received unexpected message: {#Reference<0.390389104.540540930.64231>, {:ok, #PID<0.635.0>}}
2023-08-16 14:00:10.250 module=supervisor pid=<0.629.0> [error]: Supervisor received unexpected message: {#Reference<0.390389104.540540930.64231>, {:ok, #PID<0.635.0>}}
2023-08-16 14:00:10.250 module=supervisor pid=<0.629.0> [error]: Supervisor received unexpected message: {#Reference<0.390389104.540540930.64231>, {:ok, #PID<0.635.0>}}
2023-08-16 14:00:10.250 module=supervisor pid=<0.629.0> [error]: Supervisor received unexpected message: {#Reference<0.390389104.540540930.64231>, {:ok, #PID<0.635.0>}}
2023-08-16 14:00:10.250 module=supervisor pid=<0.629.0> [error]: Supervisor received unexpected message: {#Reference<0.390389104.540540930.64231>, {:ok, #PID<0.635.0>}}
2023-08-16 14:00:10.250 module=supervisor pid=<0.629.0> [error]: Supervisor received unexpected message: {#Reference<0.390389104.540540930.64231>, {:ok, #PID<0.635.0>}}
I checked the process info of {:ok, #PID<0.635.0>}
, and it looks like it is the DBConnection.ConnectionPool
:
Process.info(pid(0, 635, 0))
[
current_function: {:gen_server, :loop, 7},
initial_call: {:proc_lib, :init_p, 5},
status: :waiting,
message_queue_len: 0,
links: [#PID<0.631.0>, #PID<0.611.0>],
dictionary: [
"$ancestors": [#PID<0.631.0>,
...,
#PID<0.622.0>],
"$initial_call": {DBConnection.ConnectionPool, :init, 1},
connection_module: Xandra.Connection
],
trap_exit: false,
error_handler: :error_handler,
priority: :normal,
group_leader: #PID<0.621.0>,
total_heap_size: 2828,
heap_size: 1598,
stack_size: 11,
reductions: 23743,
garbage_collection: [
max_heap_size: %{error_logger: true, kill: true, size: 0},
min_bin_vheap_size: 46422,
min_heap_size: 233,
fullsweep_after: 65535,
minor_gcs: 56
],
suspending: []
]
state while starting Xandra.Pool
:
%Xandra.Cluster.Pool{
pool_options: [
protocol_version: :v5,
authentication: {Xandra.Authenticator.Password,
[username: "cassandra", password: "cassandra"]},
after_connect: #Function<0.98838439/1 in Conn.maybe_set_keyspace/1>,
attempt_connection: true,
default_consistency: :local_one,
atom_keys: true,
pool_size: 10
],
contact_nodes: [{'localhost', 9042}],
autodiscovered_nodes_port: 9042,
pool_supervisor: #PID<0.631.0>,
control_connection: nil,
refresh_topology_interval: 300000,
load_balancing_module: Xandra.Cluster.LoadBalancingPolicy.DCAwareRoundRobin,
load_balancing_state: %Xandra.Cluster.LoadBalancingPolicy.DCAwareRoundRobin{
local_dc: nil,
local_hosts: [],
remote_hosts: []
},
target_pools: 1,
sync_connect_ref: nil,
name: {:local, :"BatterySignalsWriter.CassandraConnection"},
peers: %{},
reqs_before_connecting: %{max_size: 100, queue: {[], []}},
xandra_mod: Xandra,
control_conn_mod: Xandra.Cluster.ControlConnection
}
@harunzengin is this in main
?
@whatyouhide yes, 20b62d1
@harunzengin I can't seem to reproduce this. Do you have steps for this? Also, which supervisor is pid=<0.629.0>
?
@whatyouhide I created a new mix project to have a minimal reproducible example, however, I also wasn't able to reproduce it there.
However, now trying with 96303f0 , I couldn't even get our application tree started. (It did start when I wiped deps
and _build
after a long compilation, any try after that didn't work):
** (MatchError) no match of right hand side value: {:error, {{:EXIT, {:noproc, {:gen_statem, :call, [#PID<0.1690.0>, :checkout, :infinity]}}}, {:child, :undefined, Conn, {Conn, :start_link, [[name: :"CassandraConnection", password: "cassandra", username: "cassandra", keyspace: :test, nodes: ["localhost"], pool_size: 5]]}, :permanent, false, 5000, :worker, [Conn]}}}
I think there's a race condition there where something isn't ready but we try to call gen_statem.
@harunzengin ok, yes, that makes sense. You are registering the cluster under a name, and maybe the process isn't ready when you call it through the name. Any chance you're using :sync_connect
? That would explain this, for example.
@whatyouhide No, there's no :sync_connect
.
I also tried with removing the given name to the process, didn't work. Something inside Xandra
has a race condition, which I'm trying to figure out.
PS: Nothing in our process tree changed, only the Xandra version.
Immediately after starting the Xandra.Cluster
, we call an attempt_connection function that looks like this:
def attempt_connection(conn, options, tries \\ 10) do
Process.sleep(50)
Xandra.Cluster.execute!(conn, "SELECT key FROM system.local LIMIT 1", [], queue: false)
this causes the exit above, since we somehow get the connection pid, but it doesn't work when we try to query it (after 50ms).
Okay, :sync_connect
explains this actually, sorry about the confusion. I guess we need to attempt the connection after the process tree has started now on our side. And I wasn't able to reproduce the unexpected messages in the supervisor as well. So I'm closing this.