bitwalker/libcluster

Unable to establish cluster in Kubernetes using Kubernetes.DNS strategy.

dkushner opened this issue · 3 comments

I'm currently deploying my distillery application to a Kubernetes cluster (v1.9.7). I have set up the headless service, configured my release with the appropriate vm.args and configured my application to set up the topology and start the cluster supervisor. I have confirmed that each of the pods created by my deployment can, in fact, ping every other pod in the deployment. I have confirmed that the DNS record for the headless service correctly resolves (indeed it appears libcluster resolves the node IPs correctly as well). However, I am still getting the following error:

09:48:33.964 [warn] [libcluster:kubernetes] unable to connect to :"flywheel@172.17.0.11"
09:48:33.965 [warn] [libcluster:kubernetes] unable to connect to :"flywheel@172.17.0.14"
09:48:33.966 [error] GenServer #PID<0.1888.0> terminating
** (FunctionClauseError) no function clause matching in Cluster.Strategy.Kubernetes.DNS.load/1
    (libcluster) lib/strategy/kubernetes_dns.ex:55: Cluster.Strategy.Kubernetes.DNS.load({:noreply, %Cluster.Strategy.State{config: [service: "contrasting-lambkin-flywheel-remoting.flywheel.svc.cluster.local", application_name: "flywheel", polling_interval: 20000], connect: {:net_kernel, :connect_node, []}, disconnect: {:erlang, :disconnect_node, []}, list_nodes: {:erlang, :nodes, [:connected]}, meta: #MapSet<[:"flywheel@172.17.0.15"]>, topology: :kubernetes}})
    (libcluster) lib/strategy/kubernetes_dns.ex:48: Cluster.Strategy.Kubernetes.DNS.handle_info/2
    (stdlib) gen_server.erl:637: :gen_server.try_dispatch/4
    (stdlib) gen_server.erl:711: :gen_server.handle_msg/6
    (stdlib) proc_lib.erl:249: :proc_lib.init_p_do_apply/3
Last message: :timeout
09:48:33.971 [info] Application flywheel exited: shutdown
{"Kernel pid terminated",application_controller,"{application_terminated,flywheel,shutdown}"}
Kernel pid terminated (application_controller) ({application_terminated,flywheel,shutdown})

It would appear that, for some reason, the VM is unable to connect to any of its peers. As I mentioned, I've verified that the network traffic is unimpeded so I'm figuring it must simply be some configuration mistake I've made but I cannot for the life of me determine what it is.

Hey @dkushner,
I think there's nothing wrong with your env or nodes, there is just a typo in lib,
that's what ** (FunctionClauseError) no function clause matching in Cluster.Strategy.Kubernetes.DNS.load/1 says.
It should return just %State{} not {:noreply, %State{}}
Could you check if it's working from my PR: #76 ?

@flowerett: Aha! Yeah, the error message was pretty straightforward but I was hesitant to try and correct it directly just because I lack familiarity with the design decisions of the library.

I've just checked your fix and it works flawlessly. Thank you so much!

I'll push a new release today with the fix, sorry for the delay, lots on my plate unfortunately :(