Kubernetes Stateful Clustering not working properly

The helm chart sets up a statefulset and while that should guarantee a certain level of stability it apparently isnt enough to properly cluster in all cases. I am currently running two nodes and when i connect with a subscribing client and then start to publish using another i get only 50% of the published messages on the first client. If i kill one of the two nodes using kubectl it seems to fix it, i am unsure if permanently because i dont recall recreating the nodes and right now it happens again but they may have been rescheduled.

To me it seems the nodes dont always properly form a cluster.

rschuh@leonis:~$ kubectl exec -ti -n vernemq vernemq-0 vmq-admin cluster show
+--------------------------------------------------------------+---------+
| Node                                                         | Running |
+--------------------------------------------------------------+---------+
| VerneMQ@vernemq-0.vernemq-headless.vernemq.svc.cluster.local | true    |
+--------------------------------------------------------------+---------+
rschuh@leonis:~$ kubectl exec -ti -n vernemq vernemq-1 vmq-admin cluster show
+--------------------------------------------------------------+---------+
| Node                                                         | Running |
+--------------------------------------------------------------+---------+
| VerneMQ@vernemq-1.vernemq-headless.vernemq.svc.cluster.local | true    |
+--------------------------------------------------------------+---------+

actually, killing the pods does not work at all right now…

log output of vernemq-1 says though:

Will join an existing Kubernetes cluster with discovery node at vernemq-0.vernemq-headless.vernemq.svc.cluster.local

@Robbilie It's all happening in the vernemq.sh script:

docker-vernemq/bin/vernemq.sh

Line 80 in 5b2f21c

    
           echo "Will join an existing Kubernetes cluster with discovery node at ${kube_pod_name}.${VERNEMQ_KUBERNETES_SUBDOMAIN}.${DOCKER_VERNEMQ_KUBERNETES_NAMESPACE}.svc.${DOCKER_VERNEMQ_KUBERNETES_CLUSTER_NAME}"

so i guess that's where we have to look/doublecheck.

I"ll be looking into the Helm stuff within the coming days.

lovely, if you need any assistance or want me to test something, let me know :)

I was having the same issue and narrowed it down to the image I was building, so this may not apply if you are using the public image.
Take a look at the /etc/vernemq/vm.args file in your running container and make sure it looks correct and has the join command in it. I was missing a line break in my base image vm.args file and so the join command was never executing correctly.

Should look something like this:

+P 256000
-env ERL_MAX_ETS_TABLES 256000
-env ERL_CRASH_DUMP /erl_crash.dump
-env ERL_FULLSWEEP_AFTER 0
-env ERL_MAX_PORTS 256000
+A 64
-setcookie vmq
-name VerneMQ@vernemq-0.vernemq-headless.default.svc.cluster.local
+K true
+W w
-smp enable
+zdbbl 32768
-eval "vmq_server_cmd:node_join('VerneMQ@vernemq-1.vernemq-headless.default.svc.cluster.local')"

Hope that helps.

the join command is part of the start script:

https://github.com/vernemq/docker-vernemq/blob/master/bin/vernemq.sh

I have the same issue. 2 nodes in the K8s cluster behave as they are separate clusters:

node0:
bash-4.4$ vmq-admin cluster show
+--------------------------------------------------------------------------------------------------+---------+
| Node | Running |
+--------------------------------------------------------------------------------------------------+---------+
| VerneMQ@vernemq-development-0.vernemq-development-headless.vernemq-development.svc.cluster.local | true |
+--------------------------------------------------------------------------------------------------+---------+

node1:

bash-4.4$ vmq-admin cluster show
+--------------------------------------------------------------------------------------------------+---------+
| Node | Running |
+--------------------------------------------------------------------------------------------------+---------+
| VerneMQ@vernemq-development-1.vernemq-development-headless.vernemq-development.svc.cluster.local | true |
+--------------------------------------------------------------------------------------------------+---------+

It is very strange because I have the same helm chart with the same values deployed in another namespace and it works fine - I have 2 nodes in the cluster.

Here is values.yaml file:

replicaCount: 2

service:
  type: LoadBalancer
  mqtt:
    enabled: false
  mqtts:
    enabled: true

resources:
  requests:
    cpu: 250m
    memory: 512Mi
  limits:
    cpu: 1000m
    memory: 2Gi

secretMounts:
  - name: tls-config
    secretName: vernemq-certificates-secret
    path: /etc/vernemq/ssl

persistentVolume:
  enabled: true
  accessModes:
    - ReadWriteOnce
  size: 1Gi

additionalEnv:
  - name: DOCKER_VERNEMQ_ALLOW_REGISTER_DURING_NETSPLIT
    value: "on"
  - name: DOCKER_VERNEMQ_ALLOW_PUBLISH_DURING_NETSPLIT
    value: "on"
  - name: DOCKER_VERNEMQ_ALLOW_SUBSCRIBE_DURING_NETSPLIT
    value: "on"
  - name: DOCKER_VERNEMQ_ALLOW_UNSUBSCRIBE_DURING_NETSPLIT
    value: "on"

  # VerneMQ Accept EULA Afrer helm-1.5.0
  - name: DOCKER_VERNEMQ_ACCEPT_EULA
    value: "yes"

  # VerneMQ Log level
  - name: DOCKER_VERNEMQ_LOG__CONSOLE
    value: console
  - name: DOCKER_VERNEMQ_LOG__CONSOLE__LEVEL
    value: info

  # VerneMQ config
  - name: DOCKER_VERNEMQ_LISTENER__MAX_CONNECTIONS
    value: "96000"
  - name: DOCKER_VERNEMQ_LISTENER__NR_OF_ACCEPTORS
    value: "100"
  - name: DOCKER_VERNEMQ_MAX_CLIENT_ID_SIZE
    value: "128"
  - name: DOCKER_VERNEMQ_MAX_MESSAGE_SIZE
    value: "524288"
  - name: DOCKER_VERNEMQ_PERSISTENT_CLIENT_EXPIRATION
    value: "1h"
  - name: DOCKER_VERNEMQ_ALLOW_ANONYMOUS
    value: "off"
  - name: DOCKER_VERNEMQ_PLUGINS__VMQ_ACL
    value: "off"
  - name: DOCKER_VERNEMQ_PLUGINS__VMQ_PASSWD
    value: "off"
  - name: DOCKER_VERNEMQ_PLUGINS__VMQ_DIVERSITY
    value: "off"
  - name: DOCKER_VERNEMQ_PLUGINS__VMQ_WEBHOOKS
    value: "on"

  # TLS config
  - name: DOCKER_VERNEMQ_LISTENER__SSL__CAFILE
    value: "/etc/vernemq/ssl/ca.crt"
  - name: DOCKER_VERNEMQ_LISTENER__SSL__CERTFILE
    value: "/etc/vernemq/ssl/tls.crt"
  - name: DOCKER_VERNEMQ_LISTENER__SSL__KEYFILE
    value: "/etc/vernemq/ssl/tls.key"
  - name: DOCKER_VERNEMQ_LISTENER__SSL__DEFAULT
    value: "0.0.0.0:8883"
  # - name: DOCKER_VERNEMQ_LISTENER__SSL__TLS_VERSION
  #   value: "tlsv1.2"
  # - name: DOCKER_VERNEMQ_LISTENER__SSL__DEPTH
  #   value: "3"
  - name: DOCKER_VERNEMQ_LISTENER__SSL__REQUIRE_CERTIFICATE
    value: "on"
  - name: DOCKER_VERNEMQ_LISTENER__SSL__USE_IDENTITY_AS_USERNAME
    value: "on"

In the pod log I see that it was going to join the cluster:

Will join an existing Kubernetes cluster with discovery node at vernemq-development-1.vernemq-development-headless.vernemq-development.svc.cluster.local
config is OK

It is strange... I upgraded Vernemq to the latest version (1.12.3) using helm chart 1.6.12 and my cluster is showing 2 nodes... But clients can't connect... So I downgraded it again to v.1.11.0 (helm chart 1.6.6) and now everything is fine - there are 2 nodes in the cluster and the clients can connect. Have no idea what happened here...

Another workaround guys. I left only 1 running node in the cluster and for another one, I deleted persistent volume (by deleting persistent volume claim for that node) and then started node 2 again. It successfully connected to the cluster and synced all the data. Now the cluster is up and running correctly with 2 nodes as desired. So, it looks like the issue was because nodes were not able to sync data.

If I'm reading this right, the default PV mode is ReadWriteOnce which means it can only be mounted on a single node, if the broker pods are scheduled on different nodes then they both won't be able to access the volume...

I am currently running to this issue using 1.13.0 in EKS. Spinning up a cluster with more than one replicas prints out a message in each one indicating that they will join the cluster, but the vmq-admin cluster show only shows themselves - is there a workaround or fix?

Oddly enough, if I exec into the pod, extract the discovery node IP from the end of the vm.args file and run the command vmq-admin cluster join discovery-node=<NodeIP> manually, it does successfully join. I have a feeling there is a race condition or something going on?

@mgagliardo91 hm, thanks. I don't see why this happens. Either this is a connectivity issue, with the point against it that it works with a manual cluster join. Or there's no join issued at all.
You use the helm charts, right? and what is the exact log line?

👉 Thank you for supporting VerneMQ: https://github.com/sponsors/vernemq
👉 Using the binary VerneMQ packages commercially (.deb/.rpm/Docker) requires a paid subscription.

@ioolkos thanks for the response.

Permissions ok: Our pod nio-mqtt-vernemq-1 belongs to StatefulSet nio-mqtt-vernemq with 2 replicas
Will join an existing Kubernetes cluster with discovery node at nio-mqtt-vernemq-0.nio-mqtt-vernemq-headless.mqtt-staging.svc.cluster.local
Did I previously leave the cluster? If so, purging old state.
Cluster doesn't know about me, this means I've left previously. Purging old state...
config is OK
-config /vernemq/data/generated.configs/app.2023.08.30.15.43.41.config -args_file /vernemq/bin/../etc/vm.args -vm_args /vernemq/bin/../etc/vm.args
Exec:  /vernemq/bin/../erts-13.2.2.2/bin/erlexec -boot /vernemq/bin/../releases/1.13.0/vernemq               -config /vernemq/data/generated.configs/app.2023.08.30.15.43.41.config -args_file /vernemq/bin/../etc/vm.args -vm_args /vernemq/bin/../etc/vm.args              -pa /vernemq/bin/../lib/erlio-patches -- console -noshell -noinput
Root: /vernemq/bin/..
15:43:43.707 [info] alarm_handler: {set,{system_memory_high_watermark,[]}}
15:43:43.823 [info] writing (updated) old actor <<64,116,63,142,70,3,75,141,179,251,252,63,152,11,95,249,73,138,56,75>> to disk
15:43:43.829 [info] writing state {[{[{actor,<<64,116,63,142,70,3,75,141,179,251,252,63,152,11,95,249,73,138,56,75>>}],1}],{dict,1,16,16,8,80,48,{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},{{[],[],[],[],[],[],[],[],[],[],[],[],[['VerneMQ@nio-mqtt-vernemq-1.nio-mqtt-vernemq-headless.mqtt-staging.svc.cluster.local',{[{actor,<<64,116,63,142,70,3,75,141,179,251,252,63,152,11,95,249,73,138,56,75>>}],1}]],[],[],[]}}},{dict,0,16,16,8,80,48,{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},{{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}}}} to disk <<75,2,131,80,0,0,1,57,120,1,203,96,206,97,96,96,96,204,96,130,82,41,12,172,137,201,37,249,69,185,64,81,17,135,18,251,62,55,102,239,222,205,191,255,216,207,224,142,255,233,217,101,225,157,149,200,152,149,193,153,194,192,146,146,153,92,146,200,152,40,0,132,28,137,1,137,6,25,2,89,104,32,131,17,85,12,108,5,136,96,74,97,8,14,75,45,202,75,245,13,116,200,203,204,215,205,45,44,41,209,45,3,9,228,22,234,26,234,97,8,101,164,38,166,228,164,22,23,235,129,21,22,151,36,166,103,230,165,235,21,151,37,235,37,231,148,22,151,164,22,233,229,228,39,39,230,144,230,9,144,99,17,30,97,32,197,35,32,173,0,42,4,102,185>>
15:43:43.851 [info] Opening LevelDB SWC database at "./data/swc_meta/meta1"
15:43:43.876 [info] Opening LevelDB SWC database at "./data/swc_meta/meta2"
15:43:43.889 [info] Opening LevelDB SWC database at "./data/swc_meta/meta3"
15:43:43.905 [info] Opening LevelDB SWC database at "./data/swc_meta/meta4"
15:43:43.917 [info] Opening LevelDB SWC database at "./data/swc_meta/meta5"
15:43:43.929 [info] Opening LevelDB SWC database at "./data/swc_meta/meta6"
15:43:43.941 [info] Opening LevelDB SWC database at "./data/swc_meta/meta7"
15:43:43.953 [info] Opening LevelDB SWC database at "./data/swc_meta/meta8"
15:43:43.964 [info] Opening LevelDB SWC database at "./data/swc_meta/meta9"
15:43:43.981 [info] Opening LevelDB SWC database at "./data/swc_meta/meta10"
15:43:44.031 [info] Try to start vmq_swc: ok
15:43:44.074 [info] Opening LevelDB database at "./data/msgstore/1"
15:43:44.086 [info] Opening LevelDB database at "./data/msgstore/2"

Then after the manual join request:

15:51:55.383 [info] Sent join request to: 'VerneMQ@nio-mqtt-vernemq-0.nio-mqtt-vernemq-headless.mqtt-staging.svc.cluster.local'
15:51:55.429 [info] successfully connected to cluster node 'VerneMQ@nio-mqtt-vernemq-0.nio-mqtt-vernemq-headless.mqtt-staging.svc.cluster.local'

@mgagliardo91 Thanks, strange. The interesting point are the first initial lines, they come from the vernemq.sh script. Did you do anything special here, like restarting a node?
it seems that that node does not even try to cluster. will need to take a look at the script, let us know here of your findings in the meantime.

👉 Thank you for supporting VerneMQ: https://github.com/sponsors/vernemq
👉 Using the binary VerneMQ packages commercially (.deb/.rpm/Docker) requires a paid subscription.

@ioolkos I am using the default vernemq.sh script, but I have been playing around and it is, in fact, a timing issue. My first assumption is that its related to the nodes needing to acquire a PVC (which can take a few seconds in EKS), and that the join_cluster logic does not retry.

I added a second script that pulls a lot of logic from the main script and attempts to retry the join, until successful. I updated the vernemq.sh script to kick it off detached and then start up normally. This appears to work everytime, and I can look at the logs and see that it can take up to 10 attempts to see the join successful, so its definitely related to things not being ready to join by the time that the join command evaluates.

join_cluster.sh:

#!/usr/bin/env bash

SECRETS_KUBERNETES_DIR="/var/run/secrets/kubernetes.io/serviceaccount"
DOCKER_VERNEMQ_KUBERNETES_CLUSTER_NAME=${DOCKER_VERNEMQ_KUBERNETES_CLUSTER_NAME:-cluster.local}

if [ -d "${SECRETS_KUBERNETES_DIR}" ] ; then
    # Let's get the namespace if it isn't set
    DOCKER_VERNEMQ_KUBERNETES_NAMESPACE=${DOCKER_VERNEMQ_KUBERNETES_NAMESPACE:-$(cat "${SECRETS_KUBERNETES_DIR}/namespace")}
fi

insecure=""
if env | grep "DOCKER_VERNEMQ_KUBERNETES_INSECURE" -q; then
    echo "Using curl with \"--insecure\" argument to access kubernetes API without matching SSL certificate"
    insecure="--insecure"
fi

function k8sCurlGet () {
    local urlPath=$1

    local hostname="kubernetes.default.svc.${DOCKER_VERNEMQ_KUBERNETES_CLUSTER_NAME}"
    local certsFile="${SECRETS_KUBERNETES_DIR}/ca.crt"
    local token=$(cat ${SECRETS_KUBERNETES_DIR}/token)
    local header="Authorization: Bearer ${token}"
    local url="https://${hostname}/${urlPath}"

    curl -sS ${insecure} --cacert ${certsFile} -H "${header}" ${url} \
      || ( echo "### Error on accessing URL ${url}" )
}

try_join() {
  local exit_code=0
  if env | grep "DOCKER_VERNEMQ_DISCOVERY_KUBERNETES" -q; then
      # Let's set our nodename correctly
      # https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.19/#list-pod-v1-core
      podList=$(k8sCurlGet "api/v1/namespaces/${DOCKER_VERNEMQ_KUBERNETES_NAMESPACE}/pods?labelSelector=${DOCKER_VERNEMQ_KUBERNETES_LABEL_SELECTOR}")
      kube_pod_names=$(echo ${podList} | jq '.items[].spec.hostname' | sed 's/"//g' | tr '\n' ' ')
      VERNEMQ_KUBERNETES_SUBDOMAIN=${DOCKER_VERNEMQ_KUBERNETES_SUBDOMAIN:-$(echo ${podList} | jq '.items[0].spec.subdomain' | tr '\n' '"' | sed 's/"//g')}

      for kube_pod_name in $kube_pod_names; do
          if [[ $kube_pod_name == "null" ]]; then
              echo "Kubernetes discovery selected, but no pods found. Maybe we're the first?"
              echo "Anyway, we won't attempt to join any cluster."
              exit 0
          fi

          if [[ $kube_pod_name != "$MY_POD_NAME" ]]; then
              discoveryHostname="${kube_pod_name}.${VERNEMQ_KUBERNETES_SUBDOMAIN}.${DOCKER_VERNEMQ_KUBERNETES_NAMESPACE}.svc.${DOCKER_VERNEMQ_KUBERNETES_CLUSTER_NAME}"
              echo "Will join an existing Kubernetes cluster with discovery node at ${discoveryHostname}"
              vmq-admin cluster show | grep "VerneMQ@${discoveryHostname}" > /dev/null || exit_code=$?
              if [ $exit_code -eq 0 ]; then
                  echo "We have already joined the cluster - no extra work required."
                  exit 0
              else
                  echo "We have yet to join the cluster - attempting manual join..."
                  vmq-admin cluster join discovery-node="VerneMQ@${discoveryHostname}"
                  sleep 2
              fi
              break
          fi
      done
  else
      exit 0
  fi
}

while true
do
    try_join
done;

@mgagliardo91 oh, that's brilliant. When you say 10 re-tries: how much was that in seconds?
Would you want to open a PR for this?

👉 Thank you for supporting VerneMQ: https://github.com/sponsors/vernemq
👉 Using the binary VerneMQ packages commercially (.deb/.rpm/Docker) requires a paid subscription.

@mgagliardo91 oh, that's brilliant. When you say 10 re-tries: how much was that in seconds? Would you want to open a PR for this?

👉 Thank you for supporting VerneMQ: https://github.com/sponsors/vernemq 👉 Using the binary VerneMQ packages commercially (.deb/.rpm/Docker) requires a paid subscription.

Up to 20 seconds for us. I can add it, sure

@mgagliardo91 oh, ok, so that's 20 seconds to get a PVC.
Looking forward to your PR, thanks!

👉 Thank you for supporting VerneMQ: https://github.com/sponsors/vernemq
👉 Using the binary VerneMQ packages commercially (.deb/.rpm/Docker) requires a paid subscription.

@ioolkos #363