Kubernetes Stateful Clustering not working properly
Robbilie opened this issue ยท 22 comments
The helm chart sets up a statefulset and while that should guarantee a certain level of stability it apparently isnt enough to properly cluster in all cases. I am currently running two nodes and when i connect with a subscribing client and then start to publish using another i get only 50% of the published messages on the first client. If i kill one of the two nodes using kubectl it seems to fix it, i am unsure if permanently because i dont recall recreating the nodes and right now it happens again but they may have been rescheduled.
To me it seems the nodes dont always properly form a cluster.
rschuh@leonis:~$ kubectl exec -ti -n vernemq vernemq-0 vmq-admin cluster show
+--------------------------------------------------------------+---------+
| Node | Running |
+--------------------------------------------------------------+---------+
| VerneMQ@vernemq-0.vernemq-headless.vernemq.svc.cluster.local | true |
+--------------------------------------------------------------+---------+
rschuh@leonis:~$ kubectl exec -ti -n vernemq vernemq-1 vmq-admin cluster show
+--------------------------------------------------------------+---------+
| Node | Running |
+--------------------------------------------------------------+---------+
| VerneMQ@vernemq-1.vernemq-headless.vernemq.svc.cluster.local | true |
+--------------------------------------------------------------+---------+
actually, killing the pods does not work at all right nowโฆ
log output of vernemq-1 says though:
Will join an existing Kubernetes cluster with discovery node at vernemq-0.vernemq-headless.vernemq.svc.cluster.local
I"ll be looking into the Helm stuff within the coming days.
lovely, if you need any assistance or want me to test something, let me know :)
I was having the same issue and narrowed it down to the image I was building, so this may not apply if you are using the public image.
Take a look at the /etc/vernemq/vm.args file in your running container and make sure it looks correct and has the join command in it. I was missing a line break in my base image vm.args file and so the join command was never executing correctly.
Should look something like this:
+P 256000
-env ERL_MAX_ETS_TABLES 256000
-env ERL_CRASH_DUMP /erl_crash.dump
-env ERL_FULLSWEEP_AFTER 0
-env ERL_MAX_PORTS 256000
+A 64
-setcookie vmq
-name VerneMQ@vernemq-0.vernemq-headless.default.svc.cluster.local
+K true
+W w
-smp enable
+zdbbl 32768
-eval "vmq_server_cmd:node_join('VerneMQ@vernemq-1.vernemq-headless.default.svc.cluster.local')"
Hope that helps.
the join command is part of the start script:
https://github.com/vernemq/docker-vernemq/blob/master/bin/vernemq.sh
I have the same issue. 2 nodes in the K8s cluster behave as they are separate clusters:
node0:
bash-4.4$ vmq-admin cluster show
+--------------------------------------------------------------------------------------------------+---------+
| Node | Running |
+--------------------------------------------------------------------------------------------------+---------+
| VerneMQ@vernemq-development-0.vernemq-development-headless.vernemq-development.svc.cluster.local | true |
+--------------------------------------------------------------------------------------------------+---------+
node1:
bash-4.4$ vmq-admin cluster show
+--------------------------------------------------------------------------------------------------+---------+
| Node | Running |
+--------------------------------------------------------------------------------------------------+---------+
| VerneMQ@vernemq-development-1.vernemq-development-headless.vernemq-development.svc.cluster.local | true |
+--------------------------------------------------------------------------------------------------+---------+
It is very strange because I have the same helm chart with the same values deployed in another namespace and it works fine - I have 2 nodes in the cluster.
Here is values.yaml file:
replicaCount: 2
service:
type: LoadBalancer
mqtt:
enabled: false
mqtts:
enabled: true
resources:
requests:
cpu: 250m
memory: 512Mi
limits:
cpu: 1000m
memory: 2Gi
secretMounts:
- name: tls-config
secretName: vernemq-certificates-secret
path: /etc/vernemq/ssl
persistentVolume:
enabled: true
accessModes:
- ReadWriteOnce
size: 1Gi
additionalEnv:
- name: DOCKER_VERNEMQ_ALLOW_REGISTER_DURING_NETSPLIT
value: "on"
- name: DOCKER_VERNEMQ_ALLOW_PUBLISH_DURING_NETSPLIT
value: "on"
- name: DOCKER_VERNEMQ_ALLOW_SUBSCRIBE_DURING_NETSPLIT
value: "on"
- name: DOCKER_VERNEMQ_ALLOW_UNSUBSCRIBE_DURING_NETSPLIT
value: "on"
# VerneMQ Accept EULA Afrer helm-1.5.0
- name: DOCKER_VERNEMQ_ACCEPT_EULA
value: "yes"
# VerneMQ Log level
- name: DOCKER_VERNEMQ_LOG__CONSOLE
value: console
- name: DOCKER_VERNEMQ_LOG__CONSOLE__LEVEL
value: info
# VerneMQ config
- name: DOCKER_VERNEMQ_LISTENER__MAX_CONNECTIONS
value: "96000"
- name: DOCKER_VERNEMQ_LISTENER__NR_OF_ACCEPTORS
value: "100"
- name: DOCKER_VERNEMQ_MAX_CLIENT_ID_SIZE
value: "128"
- name: DOCKER_VERNEMQ_MAX_MESSAGE_SIZE
value: "524288"
- name: DOCKER_VERNEMQ_PERSISTENT_CLIENT_EXPIRATION
value: "1h"
- name: DOCKER_VERNEMQ_ALLOW_ANONYMOUS
value: "off"
- name: DOCKER_VERNEMQ_PLUGINS__VMQ_ACL
value: "off"
- name: DOCKER_VERNEMQ_PLUGINS__VMQ_PASSWD
value: "off"
- name: DOCKER_VERNEMQ_PLUGINS__VMQ_DIVERSITY
value: "off"
- name: DOCKER_VERNEMQ_PLUGINS__VMQ_WEBHOOKS
value: "on"
# TLS config
- name: DOCKER_VERNEMQ_LISTENER__SSL__CAFILE
value: "/etc/vernemq/ssl/ca.crt"
- name: DOCKER_VERNEMQ_LISTENER__SSL__CERTFILE
value: "/etc/vernemq/ssl/tls.crt"
- name: DOCKER_VERNEMQ_LISTENER__SSL__KEYFILE
value: "/etc/vernemq/ssl/tls.key"
- name: DOCKER_VERNEMQ_LISTENER__SSL__DEFAULT
value: "0.0.0.0:8883"
# - name: DOCKER_VERNEMQ_LISTENER__SSL__TLS_VERSION
# value: "tlsv1.2"
# - name: DOCKER_VERNEMQ_LISTENER__SSL__DEPTH
# value: "3"
- name: DOCKER_VERNEMQ_LISTENER__SSL__REQUIRE_CERTIFICATE
value: "on"
- name: DOCKER_VERNEMQ_LISTENER__SSL__USE_IDENTITY_AS_USERNAME
value: "on"
In the pod log I see that it was going to join the cluster:
Will join an existing Kubernetes cluster with discovery node at vernemq-development-1.vernemq-development-headless.vernemq-development.svc.cluster.local
config is OK
It is strange... I upgraded Vernemq to the latest version (1.12.3) using helm chart 1.6.12 and my cluster is showing 2 nodes... But clients can't connect... So I downgraded it again to v.1.11.0 (helm chart 1.6.6) and now everything is fine - there are 2 nodes in the cluster and the clients can connect. Have no idea what happened here...
Another workaround guys. I left only 1 running node in the cluster and for another one, I deleted persistent volume (by deleting persistent volume claim for that node) and then started node 2 again. It successfully connected to the cluster and synced all the data. Now the cluster is up and running correctly with 2 nodes as desired. So, it looks like the issue was because nodes were not able to sync data.
If I'm reading this right, the default PV mode is ReadWriteOnce
which means it can only be mounted on a single node, if the broker pods are scheduled on different nodes then they both won't be able to access the volume...
I am currently running to this issue using 1.13.0
in EKS. Spinning up a cluster with more than one replicas prints out a message in each one indicating that they will join the cluster, but the vmq-admin cluster show
only shows themselves - is there a workaround or fix?
Oddly enough, if I exec into the pod, extract the discovery node IP from the end of the vm.args
file and run the command vmq-admin cluster join discovery-node=<NodeIP>
manually, it does successfully join. I have a feeling there is a race condition or something going on?
@mgagliardo91 hm, thanks. I don't see why this happens. Either this is a connectivity issue, with the point against it that it works with a manual cluster join. Or there's no join issued at all.
You use the helm charts, right? and what is the exact log line?
๐ Thank you for supporting VerneMQ: https://github.com/sponsors/vernemq
๐ Using the binary VerneMQ packages commercially (.deb/.rpm/Docker) requires a paid subscription.
@ioolkos thanks for the response.
Permissions ok: Our pod nio-mqtt-vernemq-1 belongs to StatefulSet nio-mqtt-vernemq with 2 replicas
Will join an existing Kubernetes cluster with discovery node at nio-mqtt-vernemq-0.nio-mqtt-vernemq-headless.mqtt-staging.svc.cluster.local
Did I previously leave the cluster? If so, purging old state.
Cluster doesn't know about me, this means I've left previously. Purging old state...
config is OK
-config /vernemq/data/generated.configs/app.2023.08.30.15.43.41.config -args_file /vernemq/bin/../etc/vm.args -vm_args /vernemq/bin/../etc/vm.args
Exec: /vernemq/bin/../erts-13.2.2.2/bin/erlexec -boot /vernemq/bin/../releases/1.13.0/vernemq -config /vernemq/data/generated.configs/app.2023.08.30.15.43.41.config -args_file /vernemq/bin/../etc/vm.args -vm_args /vernemq/bin/../etc/vm.args -pa /vernemq/bin/../lib/erlio-patches -- console -noshell -noinput
Root: /vernemq/bin/..
15:43:43.707 [info] alarm_handler: {set,{system_memory_high_watermark,[]}}
15:43:43.823 [info] writing (updated) old actor <<64,116,63,142,70,3,75,141,179,251,252,63,152,11,95,249,73,138,56,75>> to disk
15:43:43.829 [info] writing state {[{[{actor,<<64,116,63,142,70,3,75,141,179,251,252,63,152,11,95,249,73,138,56,75>>}],1}],{dict,1,16,16,8,80,48,{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},{{[],[],[],[],[],[],[],[],[],[],[],[],[['VerneMQ@nio-mqtt-vernemq-1.nio-mqtt-vernemq-headless.mqtt-staging.svc.cluster.local',{[{actor,<<64,116,63,142,70,3,75,141,179,251,252,63,152,11,95,249,73,138,56,75>>}],1}]],[],[],[]}}},{dict,0,16,16,8,80,48,{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},{{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}}}} to disk <<75,2,131,80,0,0,1,57,120,1,203,96,206,97,96,96,96,204,96,130,82,41,12,172,137,201,37,249,69,185,64,81,17,135,18,251,62,55,102,239,222,205,191,255,216,207,224,142,255,233,217,101,225,157,149,200,152,149,193,153,194,192,146,146,153,92,146,200,152,40,0,132,28,137,1,137,6,25,2,89,104,32,131,17,85,12,108,5,136,96,74,97,8,14,75,45,202,75,245,13,116,200,203,204,215,205,45,44,41,209,45,3,9,228,22,234,26,234,97,8,101,164,38,166,228,164,22,23,235,129,21,22,151,36,166,103,230,165,235,21,151,37,235,37,231,148,22,151,164,22,233,229,228,39,39,230,144,230,9,144,99,17,30,97,32,197,35,32,173,0,42,4,102,185>>
15:43:43.851 [info] Opening LevelDB SWC database at "./data/swc_meta/meta1"
15:43:43.876 [info] Opening LevelDB SWC database at "./data/swc_meta/meta2"
15:43:43.889 [info] Opening LevelDB SWC database at "./data/swc_meta/meta3"
15:43:43.905 [info] Opening LevelDB SWC database at "./data/swc_meta/meta4"
15:43:43.917 [info] Opening LevelDB SWC database at "./data/swc_meta/meta5"
15:43:43.929 [info] Opening LevelDB SWC database at "./data/swc_meta/meta6"
15:43:43.941 [info] Opening LevelDB SWC database at "./data/swc_meta/meta7"
15:43:43.953 [info] Opening LevelDB SWC database at "./data/swc_meta/meta8"
15:43:43.964 [info] Opening LevelDB SWC database at "./data/swc_meta/meta9"
15:43:43.981 [info] Opening LevelDB SWC database at "./data/swc_meta/meta10"
15:43:44.031 [info] Try to start vmq_swc: ok
15:43:44.074 [info] Opening LevelDB database at "./data/msgstore/1"
15:43:44.086 [info] Opening LevelDB database at "./data/msgstore/2"
Then after the manual join request:
15:51:55.383 [info] Sent join request to: 'VerneMQ@nio-mqtt-vernemq-0.nio-mqtt-vernemq-headless.mqtt-staging.svc.cluster.local'
15:51:55.429 [info] successfully connected to cluster node 'VerneMQ@nio-mqtt-vernemq-0.nio-mqtt-vernemq-headless.mqtt-staging.svc.cluster.local'
@mgagliardo91 Thanks, strange. The interesting point are the first initial lines, they come from the vernemq.sh
script. Did you do anything special here, like restarting a node?
it seems that that node does not even try to cluster. will need to take a look at the script, let us know here of your findings in the meantime.
๐ Thank you for supporting VerneMQ: https://github.com/sponsors/vernemq
๐ Using the binary VerneMQ packages commercially (.deb/.rpm/Docker) requires a paid subscription.
@ioolkos I am using the default vernemq.sh
script, but I have been playing around and it is, in fact, a timing issue. My first assumption is that its related to the nodes needing to acquire a PVC (which can take a few seconds in EKS), and that the join_cluster
logic does not retry.
I added a second script that pulls a lot of logic from the main script and attempts to retry the join, until successful. I updated the vernemq.sh
script to kick it off detached and then start up normally. This appears to work everytime, and I can look at the logs and see that it can take up to 10 attempts to see the join successful, so its definitely related to things not being ready to join by the time that the join command evaluates.
join_cluster.sh:
#!/usr/bin/env bash
SECRETS_KUBERNETES_DIR="/var/run/secrets/kubernetes.io/serviceaccount"
DOCKER_VERNEMQ_KUBERNETES_CLUSTER_NAME=${DOCKER_VERNEMQ_KUBERNETES_CLUSTER_NAME:-cluster.local}
if [ -d "${SECRETS_KUBERNETES_DIR}" ] ; then
# Let's get the namespace if it isn't set
DOCKER_VERNEMQ_KUBERNETES_NAMESPACE=${DOCKER_VERNEMQ_KUBERNETES_NAMESPACE:-$(cat "${SECRETS_KUBERNETES_DIR}/namespace")}
fi
insecure=""
if env | grep "DOCKER_VERNEMQ_KUBERNETES_INSECURE" -q; then
echo "Using curl with \"--insecure\" argument to access kubernetes API without matching SSL certificate"
insecure="--insecure"
fi
function k8sCurlGet () {
local urlPath=$1
local hostname="kubernetes.default.svc.${DOCKER_VERNEMQ_KUBERNETES_CLUSTER_NAME}"
local certsFile="${SECRETS_KUBERNETES_DIR}/ca.crt"
local token=$(cat ${SECRETS_KUBERNETES_DIR}/token)
local header="Authorization: Bearer ${token}"
local url="https://${hostname}/${urlPath}"
curl -sS ${insecure} --cacert ${certsFile} -H "${header}" ${url} \
|| ( echo "### Error on accessing URL ${url}" )
}
try_join() {
local exit_code=0
if env | grep "DOCKER_VERNEMQ_DISCOVERY_KUBERNETES" -q; then
# Let's set our nodename correctly
# https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.19/#list-pod-v1-core
podList=$(k8sCurlGet "api/v1/namespaces/${DOCKER_VERNEMQ_KUBERNETES_NAMESPACE}/pods?labelSelector=${DOCKER_VERNEMQ_KUBERNETES_LABEL_SELECTOR}")
kube_pod_names=$(echo ${podList} | jq '.items[].spec.hostname' | sed 's/"//g' | tr '\n' ' ')
VERNEMQ_KUBERNETES_SUBDOMAIN=${DOCKER_VERNEMQ_KUBERNETES_SUBDOMAIN:-$(echo ${podList} | jq '.items[0].spec.subdomain' | tr '\n' '"' | sed 's/"//g')}
for kube_pod_name in $kube_pod_names; do
if [[ $kube_pod_name == "null" ]]; then
echo "Kubernetes discovery selected, but no pods found. Maybe we're the first?"
echo "Anyway, we won't attempt to join any cluster."
exit 0
fi
if [[ $kube_pod_name != "$MY_POD_NAME" ]]; then
discoveryHostname="${kube_pod_name}.${VERNEMQ_KUBERNETES_SUBDOMAIN}.${DOCKER_VERNEMQ_KUBERNETES_NAMESPACE}.svc.${DOCKER_VERNEMQ_KUBERNETES_CLUSTER_NAME}"
echo "Will join an existing Kubernetes cluster with discovery node at ${discoveryHostname}"
vmq-admin cluster show | grep "VerneMQ@${discoveryHostname}" > /dev/null || exit_code=$?
if [ $exit_code -eq 0 ]; then
echo "We have already joined the cluster - no extra work required."
exit 0
else
echo "We have yet to join the cluster - attempting manual join..."
vmq-admin cluster join discovery-node="VerneMQ@${discoveryHostname}"
sleep 2
fi
break
fi
done
else
exit 0
fi
}
while true
do
try_join
done;
@mgagliardo91 oh, that's brilliant. When you say 10 re-tries: how much was that in seconds?
Would you want to open a PR for this?
๐ Thank you for supporting VerneMQ: https://github.com/sponsors/vernemq
๐ Using the binary VerneMQ packages commercially (.deb/.rpm/Docker) requires a paid subscription.
@mgagliardo91 oh, that's brilliant. When you say 10 re-tries: how much was that in seconds? Would you want to open a PR for this?
๐ Thank you for supporting VerneMQ: https://github.com/sponsors/vernemq ๐ Using the binary VerneMQ packages commercially (.deb/.rpm/Docker) requires a paid subscription.
Up to 20 seconds for us. I can add it, sure
@mgagliardo91 oh, ok, so that's 20 seconds to get a PVC.
Looking forward to your PR, thanks!
๐ Thank you for supporting VerneMQ: https://github.com/sponsors/vernemq
๐ Using the binary VerneMQ packages commercially (.deb/.rpm/Docker) requires a paid subscription.