VerneMQ cluster not working in Ipv6 only environment on Kubernetes
avinakollu opened this issue ยท 11 comments
Hi,
We have two environments where we are trying to deploy VerneMQ using the helm chart. One is dual stack and the other ipv6.
The dual stack environment works fine with the latest chart version. The issue however is the ipv6.
Firstly, vmq admin does not work and I see the same issue with vernemq ping.
~ $ vmq-admin
Node 'VerneMQ@vernemq-0.vernemq-headless.messaging.svc.cluster.local' not responding to pings.
~ $ vernemq ping
Node 'VerneMQ@vernemq-0.vernemq-headless.messaging.svc.cluster.local' not responding to pings.
~ $ netstat -tunlup
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:4369 0.0.0.0:* LISTEN 191/epmd
tcp 0 0 127.0.0.1:1883 0.0.0.0:* LISTEN 131/beam.smp
tcp 0 0 :::9100 :::* LISTEN 131/beam.smp
tcp 0 0 2600:1f14:22b:5502:dd4a::1:8080 :::* LISTEN 131/beam.smp
tcp 0 0 :::4369 :::* LISTEN 191/epmd
tcp 0 0 2600:1f14:22b:5502:dd4a::1:44053 :::* LISTEN 131/beam.smp
tcp 0 0 ::1:8888 :::* LISTEN 131/beam.smp
tcp 0 0 2600:1f14:22b:5502:dd4a::1:8888 :::* LISTEN 131/beam.smp
tcp 0 0 ::1:1883 :::* LISTEN 131/beam.smp
tcp 0 0 2600:1f14:22b:5502:dd4a::1:1883 :::* LISTEN 131/beam.smp
Here is the same on the dual stack cluster:
~ $ vmq-admin
Usage: vmq-admin <sub-command>
Administrate the cluster.
Sub-commands:
node Manage this node
cluster Manage this node's cluster membership
session Retrieve session information
retain Show and filter MQTT retained messages
plugin Manage plugin system
listener Manage listener interfaces
metrics Retrieve System Metrics
api-key Manage API keys for the HTTP management interface
trace Trace various aspects of VerneMQ
Use --help after a sub-command for more details.
~ $ vernemq ping
pong
~ $ netstat -tunlup
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 127.0.0.1:8888 0.0.0.0:* LISTEN 136/beam.smp
tcp 0 0 7.7.25.193:8888 0.0.0.0:* LISTEN 136/beam.smp
tcp 0 0 127.0.0.1:1883 0.0.0.0:* LISTEN 136/beam.smp
tcp 0 0 7.7.25.193:1883 0.0.0.0:* LISTEN 136/beam.smp
tcp 0 0 0.0.0.0:9100 0.0.0.0:* LISTEN 136/beam.smp
tcp 0 0 7.7.25.193:8080 0.0.0.0:* LISTEN 136/beam.smp
tcp 0 0 0.0.0.0:4369 0.0.0.0:* LISTEN 194/epmd
tcp 0 0 7.7.25.193:44053 0.0.0.0:* LISTEN 136/beam.smp
tcp 0 0 :::4369 :::* LISTEN 194/epmd
I feel Im missing some listener for the ipv6 setup. But I have exhausted all my options. I have tried most of the things I could find. Please help me figure out what I might be missing.
I can provide more logs/configs if required.
There's an open issue: vernemq/vernemq#1664
The vmq-admin scripts do not connect when the cluster communication is configured to use ipv6.
(which means having -proto_dist inet6_tcp
in vmq.args enabled, so that Erlang cluster comm uses ipv6).
I'm not sure here whether your nodes still cluster. Can you access the status page on port 8888 for one of the nodes, and check whether it shows a full cluster?
We need to find a way to fix the script issue.
๐ Thank you for supporting VerneMQ: https://github.com/sponsors/vernemq
๐ Using the binary VerneMQ packages commercially (.deb/.rpm/Docker) requires a paid subscription.
@ioolkos Thanks for your response.
So, I have two scenarios:
- When I enable -proto_dist inet6_tcp in vm.args, only vernemq-0 comes up. The second one fails with the following trace:
[rocky@ip-10-220-150-249 ~]$ kubectl logs vernemq-1 -n messaging
Permissions ok: Our pod vernemq-1 belongs to StatefulSet vernemq with 2 replicas
Will join an existing Kubernetes cluster with discovery node at vernemq-0.vernemq-headless.messaging.svc.cluster.local
Did I previously leave the cluster? If so, purging old state.
Cluster doesn't know about me, this means I've left previously. Purging old state...
Password:
Reenter password:
config is OK
-config /vernemq/data/generated.configs/app.2023.02.27.23.56.21.config -args_file /vernemq/bin/../etc/vm.args -vm_args /vernemq/bin/../etc/vm.args
Exec: /vernemq/bin/../erts-12.3.2.5/bin/erlexec -boot /vernemq/bin/../releases/1.12.6.2/vernemq -config /vernemq/data/generated.configs/app.2023.02.27.23.56.21.config -args_file /vernemq/bin/../etc/vm.args -vm_args /vernemq/bin/../etc/vm.args -pa /vernemq/bin/../lib/erlio-patches -- console -noshell -noinput
Root: /vernemq/bin/..
Protocol 'inet6_tcp-eval': not supported
Protocol 'vmq_server_cmd:node_join('VerneMQ@vernemq-0.vernemq-headless.messaging.svc.cluster.local')': not supported
- Without the flag though, all 3 replicas do come up but I do not see the nodes in the cluster status page. Here is the status page:
The reason is that the node tries to cluster automatically. And vmq_server_cmd:node_join/1
is a wrapper for a vmq-admin
call which bring us back to the mentioned incompatibility.
https://github.com/vernemq/vernemq/blob/5c14718469cc861241caa2b920ef5bca25283d71/apps/vmq_server/src/vmq_server_cmd.erl#L28
I don't see what's wrong with scenario 2. Any logs from the nodes?
๐ Thank you for supporting VerneMQ: https://github.com/sponsors/vernemq
๐ Using the binary VerneMQ packages commercially (.deb/.rpm/Docker) requires a paid subscription.
Note to self: find a way to inject -proto_dist inet6_tcp
into the noderunner escript. Maybe we need to have a second ip6-enabled version of the script and then make vmq-admin
choose via a flag.
EDIT: we can add %%! -proto_dist inet6_tcp
as the second line to noderunner.
๐ Thank you for supporting VerneMQ: https://github.com/sponsors/vernemq
๐ Using the binary VerneMQ packages commercially (.deb/.rpm/Docker) requires a paid subscription.
Yeah I do have logs.
Node 1:
$ kubectl logs vernemq-0 -n messaging
Permissions ok: Our pod vernemq-0 belongs to StatefulSet vernemq with 1 replicas
Password:
Reenter password:
config is OK
-config /vernemq/data/generated.configs/app.2023.03.01.00.02.10.config -args_file /vernemq/bin/../etc/vm.args -vm_args /vernemq/bin/../etc/vm.args
Exec: /vernemq/bin/../erts-12.3.2.5/bin/erlexec -boot /vernemq/bin/../releases/1.12.6.2/vernemq -config /vernemq/data/generated.configs/app.2023.03.01.00.02.10.config -args_file /vernemq/bin/../etc/vm.args -vm_args /vernemq/bin/../etc/vm.args -pa /vernemq/bin/../lib/erlio-patches -- console -noshell -noinput
Root: /vernemq/bin/..
00:02:12.624 [info] alarm_handler: {set,{system_memory_high_watermark,[]}}
00:02:12.728 [info] writing (updated) old actor <<217,63,70,135,63,206,115,43,49,140,165,14,237,32,235,220,239,75,136,229>> to disk
00:02:12.736 [info] writing state {[{[{actor,<<217,63,70,135,63,206,115,43,49,140,165,14,237,32,235,220,239,75,136,229>>}],1}],{dict,1,16,16,8,80,48,{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},{{[],[],[],[],[],[],[],[],[],[],[],[],[['VerneMQ@vernemq-0.vernemq-headless.messaging.svc.cluster.local',{[{actor,<<217,63,70,135,63,206,115,43,49,140,165,14,237,32,235,220,239,75,136,229>>}],1}]],[],[],[]}}},{dict,0,16,16,8,80,48,{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},{{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}}}} to disk <<75,2,131,80,0,0,1,36,120,1,203,96,206,97,96,96,96,204,96,130,82,41,12,172,137,201,37,249,69,185,64,81,145,155,246,110,237,246,231,138,181,13,123,150,242,189,85,120,125,231,189,119,199,211,172,68,198,172,12,206,20,6,150,148,204,228,146,68,198,68,1,32,228,72,12,72,52,200,16,200,66,3,25,140,168,98,96,43,64,4,83,10,131,93,88,106,81,94,170,111,160,67,25,136,206,45,212,53,208,131,177,50,82,19,83,114,82,139,139,245,114,129,68,98,122,102,94,186,94,113,89,178,94,114,78,105,113,73,106,145,94,78,126,114,98,14,105,238,5,185,11,225,102,6,82,220,12,210,10,0,163,254,97,243>>
00:02:12.757 [info] Opening LevelDB SWC database at "./data/swc_meta/meta1"
00:02:12.781 [info] Opening LevelDB SWC database at "./data/swc_meta/meta2"
00:02:12.791 [info] Opening LevelDB SWC database at "./data/swc_meta/meta3"
00:02:12.800 [info] Opening LevelDB SWC database at "./data/swc_meta/meta4"
00:02:12.810 [info] Opening LevelDB SWC database at "./data/swc_meta/meta5"
00:02:12.819 [info] Opening LevelDB SWC database at "./data/swc_meta/meta6"
00:02:12.828 [info] Opening LevelDB SWC database at "./data/swc_meta/meta7"
00:02:12.837 [info] Opening LevelDB SWC database at "./data/swc_meta/meta8"
00:02:12.847 [info] Opening LevelDB SWC database at "./data/swc_meta/meta9"
00:02:12.858 [info] Opening LevelDB SWC database at "./data/swc_meta/meta10"
00:02:12.910 [info] Try to start vmq_swc: ok
00:02:12.956 [info] Opening LevelDB database at "./data/msgstore/1"
00:02:12.971 [info] Opening LevelDB database at "./data/msgstore/2"
00:02:12.985 [info] Opening LevelDB database at "./data/msgstore/3"
00:02:12.994 [info] Opening LevelDB database at "./data/msgstore/4"
00:02:13.001 [info] Opening LevelDB database at "./data/msgstore/5"
00:02:13.010 [info] Opening LevelDB database at "./data/msgstore/6"
00:02:13.019 [info] Opening LevelDB database at "./data/msgstore/7"
00:02:13.028 [info] Opening LevelDB database at "./data/msgstore/8"
00:02:13.036 [info] Opening LevelDB database at "./data/msgstore/9"
00:02:13.044 [info] Opening LevelDB database at "./data/msgstore/10"
00:02:13.053 [info] Opening LevelDB database at "./data/msgstore/11"
00:02:13.062 [info] Opening LevelDB database at "./data/msgstore/12"
00:02:13.122 [info] Try to start vmq_generic_msg_store: ok
00:02:13.230 [info] loaded 0 subscriptions into vmq_reg_trie
00:02:13.249 [info] cluster event handler 'vmq_cluster' registered
Node 2:
$ kubectl logs vernemq-1 -n messaging
Permissions ok: Our pod vernemq-1 belongs to StatefulSet vernemq with 2 replicas
Will join an existing Kubernetes cluster with discovery node at vernemq-0.vernemq-headless.messaging.svc.cluster.local
Did I previously leave the cluster? If so, purging old state.
Cluster doesn't know about me, this means I've left previously. Purging old state...
Password:
Reenter password:
config is OK
-config /vernemq/data/generated.configs/app.2023.03.01.00.04.01.config -args_file /vernemq/bin/../etc/vm.args -vm_args /vernemq/bin/../etc/vm.args
Exec: /vernemq/bin/../erts-12.3.2.5/bin/erlexec -boot /vernemq/bin/../releases/1.12.6.2/vernemq -config /vernemq/data/generated.configs/app.2023.03.01.00.04.01.config -args_file /vernemq/bin/../etc/vm.args -vm_args /vernemq/bin/../etc/vm.args -pa /vernemq/bin/../lib/erlio-patches -- console -noshell -noinput
Root: /vernemq/bin/..
00:04:03.212 [info] alarm_handler: {set,{system_memory_high_watermark,[]}}
00:04:03.314 [info] writing (updated) old actor <<165,158,8,12,24,41,0,246,32,145,173,99,202,109,217,6,192,216,199,63>> to disk
00:04:03.322 [info] writing state {[{[{actor,<<165,158,8,12,24,41,0,246,32,145,173,99,202,109,217,6,192,216,199,63>>}],1}],{dict,1,16,16,8,80,48,{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},{{[],[],[],[],[],[],[],[],[],[],[],[],[['VerneMQ@vernemq-1.vernemq-headless.messaging.svc.cluster.local',{[{actor,<<165,158,8,12,24,41,0,246,32,145,173,99,202,109,217,6,192,216,199,63>>}],1}]],[],[],[]}}},{dict,0,16,16,8,80,48,{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},{{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}}}} to disk <<75,2,131,80,0,0,1,36,120,1,203,96,206,97,96,96,96,204,96,130,82,41,12,172,137,201,37,249,69,185,64,81,145,165,243,56,120,36,52,25,190,41,76,92,155,124,42,247,38,219,129,27,199,237,179,18,25,179,50,56,83,24,88,82,50,147,75,18,25,19,5,128,144,35,49,32,209,32,67,32,11,13,100,48,162,138,129,173,0,17,76,41,12,118,97,169,69,121,169,190,129,14,101,32,58,183,80,215,80,15,198,202,72,77,76,201,73,45,46,214,203,5,18,137,233,153,121,233,122,197,101,201,122,201,57,165,197,37,169,69,122,57,249,201,137,57,164,185,23,228,46,132,155,25,72,113,51,72,43,0,185,179,95,4>>
00:04:03.346 [info] Opening LevelDB SWC database at "./data/swc_meta/meta1"
00:04:03.368 [info] Opening LevelDB SWC database at "./data/swc_meta/meta2"
00:04:03.377 [info] Opening LevelDB SWC database at "./data/swc_meta/meta3"
00:04:03.386 [info] Opening LevelDB SWC database at "./data/swc_meta/meta4"
00:04:03.395 [info] Opening LevelDB SWC database at "./data/swc_meta/meta5"
00:04:03.404 [info] Opening LevelDB SWC database at "./data/swc_meta/meta6"
00:04:03.417 [info] Opening LevelDB SWC database at "./data/swc_meta/meta7"
00:04:03.425 [info] Opening LevelDB SWC database at "./data/swc_meta/meta8"
00:04:03.434 [info] Opening LevelDB SWC database at "./data/swc_meta/meta9"
00:04:03.444 [info] Opening LevelDB SWC database at "./data/swc_meta/meta10"
00:04:03.493 [info] Try to start vmq_swc: ok
00:04:03.530 [info] Opening LevelDB database at "./data/msgstore/1"
00:04:03.539 [info] Opening LevelDB database at "./data/msgstore/2"
00:04:03.547 [info] Opening LevelDB database at "./data/msgstore/3"
00:04:03.556 [info] Opening LevelDB database at "./data/msgstore/4"
00:04:03.564 [info] Opening LevelDB database at "./data/msgstore/5"
00:04:03.572 [info] Opening LevelDB database at "./data/msgstore/6"
00:04:03.581 [info] Opening LevelDB database at "./data/msgstore/7"
00:04:03.589 [info] Opening LevelDB database at "./data/msgstore/8"
00:04:03.598 [info] Opening LevelDB database at "./data/msgstore/9"
00:04:03.608 [info] Opening LevelDB database at "./data/msgstore/10"
00:04:03.616 [info] Opening LevelDB database at "./data/msgstore/11"
00:04:03.624 [info] Opening LevelDB database at "./data/msgstore/12"
00:04:03.658 [info] Try to start vmq_generic_msg_store: ok
00:04:03.763 [info] loaded 0 subscriptions into vmq_reg_trie
00:04:03.771 [info] cluster event handler 'vmq_cluster' registered
00:04:04.610 [info] Sent join request to: 'VerneMQ@vernemq-0.vernemq-headless.messaging.svc.cluster.local'
00:04:04.615 [info] Unable to connect to 'VerneMQ@vernemq-0.vernemq-headless.messaging.svc.cluster.local'
00:04:04.615 [info] Unable to connect to 'VerneMQ@vernemq-0.vernemq-headless.messaging.svc.cluster.local'
There is no connectivity on the Erlang distribution level, judging from that log line. net_kernel:connect_node/1
fails. Whether this comes from one of the configs, or some Kubernetes configs (maybe done previously), I don't know.
๐ Thank you for supporting VerneMQ: https://github.com/sponsors/vernemq
๐ Using the binary VerneMQ packages commercially (.deb/.rpm/Docker) requires a paid subscription.
@ioolkos Can you please tell me which endpoint this tries to connect to? I will verify if the connection is successful. As I said, this is an ipv6 only cluster where I had to change the listeners to get it to this point.
I see, then all listeners are IPv6 but the Erlang distribution is not enabled for IPv6.
For an idea on the ports involved (IPv4) see: https://docs.vernemq.com/vernemq-clustering/communication
But in any case really need to enable IPv6 in the noderunner script (see my remark above); otherwise the join command will not work.
๐ Thank you for supporting VerneMQ: https://github.com/sponsors/vernemq
๐ Using the binary VerneMQ packages commercially (.deb/.rpm/Docker) requires a paid subscription.
@ioolkos Does this mean that we will have to wait for vernemq/vernemq#1664 to be fixed first?
Is there an expected release date for this so we can plan accordingly?
Thanks for your help once again
@avinakollu yes, that's the context. I count on having this fixed in the next release, but I have no ETA.
Do you plan on supporting the VerneMQ project?
๐ Thank you for supporting VerneMQ: https://github.com/sponsors/vernemq
๐ Using the binary VerneMQ packages commercially (.deb/.rpm/Docker) requires a paid subscription.
@avinakollu here's the PR that builds the nodetool/vmq-admin script dynamically to adapt for ipv4 or ipv6: vernemq/vernemq#2134
Once a new release is out, we'll need to ensure this works in Docker too. (it should, but you never know). For a normal build it works perfectly.
๐ Thank you for supporting VerneMQ: https://github.com/sponsors/vernemq
๐ Using the binary VerneMQ packages commercially (.deb/.rpm/Docker) requires a paid subscription.