emqx/emqx-operator

EMQX-Cluster not working in IPV6 only network

axkng opened this issue · 28 comments

axkng commented

Describe the bug
After following the getting-started page to setup the emqx-operator I provisioned a emqx-cluster.
The pods start and are running, but the status commands return errors:

kubectl exec -n emqx -it emqx-0 -c emqx -- emqx_ctl status
Node 'emqx@emqx-0.emqx-headless.emqx.svc.cluster.local' not responding to pings.
/opt/emqx/bin/emqx: line 46: die: command not found
command terminated with exit code 127

To Reproduce
Steps to reproduce the behavior:

  1. Deploy the operator to a EKS cluster with Kubernetes 1.22.9
  2. Deploy a simple broker (can be without persistence, I tested that.)
  3. Check the output of the status commands and get errors.

Expected behavior
Not to get errors on the status commands after provisioning a simple broker with no config.

Anything else we need to know?:

Environment details::

  • Kubernetes version: 1.22.9
  • Cloud-provider/provisioner: AWS EKS
  • emqx-operator version: 1.2.4
  • Install method: helm, emqx deployed as crd
    emqx-manifest:
---
apiVersion: apps.emqx.io/v1beta3
kind: EmqxBroker
metadata:
  name: emqx
  labels:
    app: emqx
    environment: dev
spec:
  persistent:
    accessModes:
      - ReadWriteOnce
    storageClassName: ebs-gp3
    resources:
      requests:
        storage: 1Gi
  emqxTemplate:
    image: emqx/emqx:4.4.6

Did I do something wrong here?

Hi, @Furragen
Could you please show emqx-operator logs and emqx custom resource status? run the following command
kubectl get EmqxBroker emqx -o json | jq '.status'
kubectl logs -f -l "control-plane=controller-manager" -n emqx-operator-system -c manager --tail=100

And the emqx pod logs
kubectl logs emqx-0 -c emqx

axkng commented

Hi @Rory-Z ,
thanks for your quick response.

kubectl get -n emqx EmqxBroker emqx -o json | jq '.status'
{
  "conditions": [
    {
      "lastTransitionTime": "2022-08-10T07:04:09Z",
      "lastUpdateTime": "2022-08-10T07:26:23Z",
      "message": "Some nodes are not ready",
      "reason": "ClusterNotReady",
      "status": "False",
      "type": "Running"
    },
    {
      "lastTransitionTime": "2022-08-10T07:03:26Z",
      "lastUpdateTime": "2022-08-10T07:03:26Z",
      "message": "All default plugins initialized",
      "reason": "PluginInitializeSuccessfully",
      "status": "True",
      "type": "PluginInitialized"
    }
  ],
  "emqxNodes": [
    {
      "node": "emqx@emqx-0.emqx-headless.emqx.svc.cluster.local",
      "node_status": "Running",
      "otp_release": "24.1.5/12.1.5",
      "version": "4.4.6"
    }
  ],
  "readyReplicas": 1,
  "replicas": 3
}

Logs of the operator ( kubectl logs -f -l "control-plane=controller-manager" -n emqx -c manager --tail=100)

Logs E0810 07:03:59.570352 1 portforward.go:234] lost connection to pod E0810 07:03:59.838997 1 portforward.go:406] an error occurred forwarding 38417 -> 8081: error forwarding port 8081 to pod 0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435, uid : failed to execute portforward in network namespace "/var/run/netns/cni-29540245-853e-246f-ed7f-0443a91d3642": failed to connect to localhost:8081 inside namespace "0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435", IPv4: dial tcp4 127.0.0.1:8081: connect: connection refused IPv6 dial tcp6 [::1]:8081: connect: connection refused E0810 07:04:00.358323 1 portforward.go:406] an error occurred forwarding 43423 -> 8081: error forwarding port 8081 to pod 0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435, uid : failed to execute portforward in network namespace "/var/run/netns/cni-29540245-853e-246f-ed7f-0443a91d3642": failed to connect to localhost:8081 inside namespace "0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435", IPv4: dial tcp4 127.0.0.1:8081: connect: connection refused IPv6 dial tcp6 [::1]:8081: connect: connection refused E0810 07:04:00.358653 1 portforward.go:234] lost connection to pod 1.660115040377564e+09 ERROR Reconciler error {"controller": "emqxbroker", "controllerGroup": "apps.emqx.io", "controllerKind": "EmqxBroker", "emqxBroker": {"name":"emqx","namespace":"emqx"}, "namespace": "emqx", "name": "emqx", "reconcileID": "d0295269-0001-4c19-ae4d-2be9e74a7321", "error": "Operation cannot be fulfilled on emqxbrokers.apps.emqx.io \"emqx\": the object has been modified; please apply your changes to the latest version and try again"} sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.2/pkg/internal/controller/controller.go:273 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2 /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.2/pkg/internal/controller/controller.go:234 E0810 07:04:00.653062 1 portforward.go:406] an error occurred forwarding 46363 -> 8081: error forwarding port 8081 to pod 0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435, uid : failed to execute portforward in network namespace "/var/run/netns/cni-29540245-853e-246f-ed7f-0443a91d3642": failed to connect to localhost:8081 inside namespace "0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435", IPv4: dial tcp4 127.0.0.1:8081: connect: connection refused IPv6 dial tcp6 [::1]:8081: connect: connection refused E0810 07:04:00.656235 1 portforward.go:234] lost connection to pod 1.6601150413087435e+09 ERROR Reconciler error {"controller": "emqxbroker", "controllerGroup": "apps.emqx.io", "controllerKind": "EmqxBroker", "emqxBroker": {"name":"emqx","namespace":"emqx"}, "namespace": "emqx", "name": "emqx", "reconcileID": "aed5651b-c774-4496-8e04-41ec215aeb76", "error": "Operation cannot be fulfilled on emqxbrokers.apps.emqx.io \"emqx\": the object has been modified; please apply your changes to the latest version and try again"} sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.2/pkg/internal/controller/controller.go:273 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2 /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.2/pkg/internal/controller/controller.go:234 E0810 07:04:01.933473 1 portforward.go:406] an error occurred forwarding 36141 -> 8081: error forwarding port 8081 to pod 0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435, uid : failed to execute portforward in network namespace "/var/run/netns/cni-29540245-853e-246f-ed7f-0443a91d3642": failed to connect to localhost:8081 inside namespace "0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435", IPv4: dial tcp4 127.0.0.1:8081: connect: connection refused IPv6 dial tcp6 [::1]:8081: connect: connection refused E0810 07:04:01.933924 1 portforward.go:234] lost connection to pod 1.6601150419643033e+09 ERROR Reconciler error {"controller": "emqxbroker", "controllerGroup": "apps.emqx.io", "controllerKind": "EmqxBroker", "emqxBroker": {"name":"emqx","namespace":"emqx"}, "namespace": "emqx", "name": "emqx", "reconcileID": "7207ec58-c48f-4f47-bb51-d156051a2e78", "error": "Operation cannot be fulfilled on emqxbrokers.apps.emqx.io \"emqx\": the object has been modified; please apply your changes to the latest version and try again"} sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.2/pkg/internal/controller/controller.go:273 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2 /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.2/pkg/internal/controller/controller.go:234 E0810 07:04:02.226753 1 portforward.go:406] an error occurred forwarding 36151 -> 8081: error forwarding port 8081 to pod 0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435, uid : failed to execute portforward in network namespace "/var/run/netns/cni-29540245-853e-246f-ed7f-0443a91d3642": failed to connect to localhost:8081 inside namespace "0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435", IPv4: dial tcp4 127.0.0.1:8081: connect: connection refused IPv6 dial tcp6 [::1]:8081: connect: connection refused E0810 07:04:02.227081 1 portforward.go:234] lost connection to pod E0810 07:04:02.639584 1 portforward.go:406] an error occurred forwarding 39885 -> 8081: error forwarding port 8081 to pod 0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435, uid : failed to execute portforward in network namespace "/var/run/netns/cni-29540245-853e-246f-ed7f-0443a91d3642": failed to connect to localhost:8081 inside namespace "0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435", IPv4: dial tcp4 127.0.0.1:8081: connect: connection refused IPv6 dial tcp6 [::1]:8081: connect: connection refused E0810 07:04:02.639792 1 portforward.go:234] lost connection to pod 1.6601150426838543e+09 ERROR Reconciler error {"controller": "emqxbroker", "controllerGroup": "apps.emqx.io", "controllerKind": "EmqxBroker", "emqxBroker": {"name":"emqx","namespace":"emqx"}, "namespace": "emqx", "name": "emqx", "reconcileID": "4ef8f472-14f7-4da7-849b-5220115b9dbc", "error": "Operation cannot be fulfilled on emqxbrokers.apps.emqx.io \"emqx\": the object has been modified; please apply your changes to the latest version and try again"} sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.2/pkg/internal/controller/controller.go:273 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2 /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.2/pkg/internal/controller/controller.go:234 E0810 07:04:02.948658 1 portforward.go:406] an error occurred forwarding 40737 -> 8081: error forwarding port 8081 to pod 0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435, uid : failed to execute portforward in network namespace "/var/run/netns/cni-29540245-853e-246f-ed7f-0443a91d3642": failed to connect to localhost:8081 inside namespace "0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435", IPv4: dial tcp4 127.0.0.1:8081: connect: connection refused IPv6 dial tcp6 [::1]:8081: connect: connection refused E0810 07:04:02.948869 1 portforward.go:234] lost connection to pod E0810 07:04:03.425248 1 portforward.go:406] an error occurred forwarding 34645 -> 8081: error forwarding port 8081 to pod 0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435, uid : failed to execute portforward in network namespace "/var/run/netns/cni-29540245-853e-246f-ed7f-0443a91d3642": failed to connect to localhost:8081 inside namespace "0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435", IPv4: dial tcp4 127.0.0.1:8081: connect: connection refused IPv6 dial tcp6 [::1]:8081: connect: connection refused E0810 07:04:03.425551 1 portforward.go:234] lost connection to pod 1.660115043447316e+09 ERROR Reconciler error {"controller": "emqxbroker", "controllerGroup": "apps.emqx.io", "controllerKind": "EmqxBroker", "emqxBroker": {"name":"emqx","namespace":"emqx"}, "namespace": "emqx", "name": "emqx", "reconcileID": "7d86ef9a-e0f3-465b-b1e0-32123f7d2377", "error": "Operation cannot be fulfilled on emqxbrokers.apps.emqx.io \"emqx\": the object has been modified; please apply your changes to the latest version and try again"} sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.2/pkg/internal/controller/controller.go:273 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2 /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.2/pkg/internal/controller/controller.go:234 E0810 07:04:03.772788 1 portforward.go:406] an error occurred forwarding 37549 -> 8081: error forwarding port 8081 to pod 0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435, uid : failed to execute portforward in network namespace "/var/run/netns/cni-29540245-853e-246f-ed7f-0443a91d3642": failed to connect to localhost:8081 inside namespace "0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435", IPv4: dial tcp4 127.0.0.1:8081: connect: connection refused IPv6 dial tcp6 [::1]:8081: connect: connection refused E0810 07:04:03.773081 1 portforward.go:234] lost connection to pod 1.6601150441139083e+09 ERROR Reconciler error {"controller": "emqxbroker", "controllerGroup": "apps.emqx.io", "controllerKind": "EmqxBroker", "emqxBroker": {"name":"emqx","namespace":"emqx"}, "namespace": "emqx", "name": "emqx", "reconcileID": "300bc970-0ebd-4d1a-a101-6b073ec449e0", "error": "failed to update StatefulSet emqx: Operation cannot be fulfilled on statefulsets.apps \"emqx\": the object has been modified; please apply your changes to the latest version and try again"} sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.2/pkg/internal/controller/controller.go:273 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2 /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.2/pkg/internal/controller/controller.go:234 E0810 07:04:04.424648 1 portforward.go:406] an error occurred forwarding 32813 -> 8081: error forwarding port 8081 to pod 0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435, uid : failed to execute portforward in network namespace "/var/run/netns/cni-29540245-853e-246f-ed7f-0443a91d3642": failed to connect to localhost:8081 inside namespace "0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435", IPv4: dial tcp4 127.0.0.1:8081: connect: connection refused IPv6 dial tcp6 [::1]:8081: connect: connection refused E0810 07:04:04.425127 1 portforward.go:234] lost connection to pod E0810 07:04:04.909241 1 portforward.go:406] an error occurred forwarding 41449 -> 8081: error forwarding port 8081 to pod 0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435, uid : failed to execute portforward in network namespace "/var/run/netns/cni-29540245-853e-246f-ed7f-0443a91d3642": failed to connect to localhost:8081 inside namespace "0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435", IPv4: dial tcp4 127.0.0.1:8081: connect: connection refused IPv6 dial tcp6 [::1]:8081: connect: connection refused E0810 07:04:04.909421 1 portforward.go:234] lost connection to pod 1.660115044925919e+09 ERROR Reconciler error {"controller": "emqxbroker", "controllerGroup": "apps.emqx.io", "controllerKind": "EmqxBroker", "emqxBroker": {"name":"emqx","namespace":"emqx"}, "namespace": "emqx", "name": "emqx", "reconcileID": "e6941001-14b7-4462-8b90-80e6ab8feac4", "error": "Operation cannot be fulfilled on emqxbrokers.apps.emqx.io \"emqx\": the object has been modified; please apply your changes to the latest version and try again"} sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.2/pkg/internal/controller/controller.go:273 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2 /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.2/pkg/internal/controller/controller.go:234 E0810 07:04:05.216967 1 portforward.go:406] an error occurred forwarding 34665 -> 8081: error forwarding port 8081 to pod 0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435, uid : failed to execute portforward in network namespace "/var/run/netns/cni-29540245-853e-246f-ed7f-0443a91d3642": failed to connect to localhost:8081 inside namespace "0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435", IPv4: dial tcp4 127.0.0.1:8081: connect: connection refused IPv6 dial tcp6 [::1]:8081: connect: connection refused E0810 07:04:05.725675 1 portforward.go:406] an error occurred forwarding 41193 -> 8081: error forwarding port 8081 to pod 0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435, uid : failed to execute portforward in network namespace "/var/run/netns/cni-29540245-853e-246f-ed7f-0443a91d3642": failed to connect to localhost:8081 inside namespace "0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435", IPv4: dial tcp4 127.0.0.1:8081: connect: connection refused IPv6 dial tcp6 [::1]:8081: connect: connection refused E0810 07:04:05.726157 1 portforward.go:234] lost connection to pod 1.6601150457562108e+09 ERROR Reconciler error {"controller": "emqxbroker", "controllerGroup": "apps.emqx.io", "controllerKind": "EmqxBroker", "emqxBroker": {"name":"emqx","namespace":"emqx"}, "namespace": "emqx", "name": "emqx", "reconcileID": "aaa3c815-7409-4e44-b752-09cff2b0531e", "error": "Operation cannot be fulfilled on emqxbrokers.apps.emqx.io \"emqx\": the object has been modified; please apply your changes to the latest version and try again"} sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.2/pkg/internal/controller/controller.go:273 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2 /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.2/pkg/internal/controller/controller.go:234 E0810 07:04:06.034595 1 portforward.go:406] an error occurred forwarding 35435 -> 8081: error forwarding port 8081 to pod 862a1a59ff6fdc75b1c8a7520a2ed57d2720c341f7015556443b4063771ccdd4, uid : failed to execute portforward in network namespace "/var/run/netns/cni-4d298c69-db9b-1c7e-ebac-314710d61826": failed to connect to localhost:8081 inside namespace "862a1a59ff6fdc75b1c8a7520a2ed57d2720c341f7015556443b4063771ccdd4", IPv4: dial tcp4 127.0.0.1:8081: connect: connection refused IPv6 dial tcp6 [::1]:8081: connect: connection refused E0810 07:04:06.034845 1 portforward.go:234] lost connection to pod E0810 07:04:06.440777 1 portforward.go:406] an error occurred forwarding 33389 -> 8081: error forwarding port 8081 to pod 0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435, uid : failed to execute portforward in network namespace "/var/run/netns/cni-29540245-853e-246f-ed7f-0443a91d3642": failed to connect to localhost:8081 inside namespace "0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435", IPv4: dial tcp4 127.0.0.1:8081: connect: connection refused IPv6 dial tcp6 [::1]:8081: connect: connection refused E0810 07:04:06.441162 1 portforward.go:234] lost connection to pod E0810 07:04:06.735334 1 portforward.go:406] an error occurred forwarding 35219 -> 8081: error forwarding port 8081 to pod 0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435, uid : failed to execute portforward in network namespace "/var/run/netns/cni-29540245-853e-246f-ed7f-0443a91d3642": failed to connect to localhost:8081 inside namespace "0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435", IPv4: dial tcp4 127.0.0.1:8081: connect: connection refused IPv6 dial tcp6 [::1]:8081: connect: connection refused E0810 07:04:06.735726 1 portforward.go:234] lost connection to pod E0810 07:04:07.043941 1 portforward.go:406] an error occurred forwarding 34727 -> 8081: error forwarding port 8081 to pod 0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435, uid : failed to execute portforward in network namespace "/var/run/netns/cni-29540245-853e-246f-ed7f-0443a91d3642": failed to connect to localhost:8081 inside namespace "0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435", IPv4: dial tcp4 127.0.0.1:8081: connect: connection refused IPv6 dial tcp6 [::1]:8081: connect: connection refused E0810 07:04:07.044309 1 portforward.go:234] lost connection to pod E0810 07:04:07.450824 1 portforward.go:406] an error occurred forwarding 34031 -> 8081: error forwarding port 8081 to pod 0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435, uid : failed to execute portforward in network namespace "/var/run/netns/cni-29540245-853e-246f-ed7f-0443a91d3642": failed to connect to localhost:8081 inside namespace "0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435", IPv4: dial tcp4 127.0.0.1:8081: connect: connection refused IPv6 dial tcp6 [::1]:8081: connect: connection refused E0810 07:04:07.451169 1 portforward.go:234] lost connection to pod E0810 07:04:07.790990 1 portforward.go:406] an error occurred forwarding 43441 -> 8081: error forwarding port 8081 to pod 0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435, uid : failed to execute portforward in network namespace "/var/run/netns/cni-29540245-853e-246f-ed7f-0443a91d3642": failed to connect to localhost:8081 inside namespace "0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435", IPv4: dial tcp4 127.0.0.1:8081: connect: connection refused IPv6 dial tcp6 [::1]:8081: connect: connection refused E0810 07:04:07.791210 1 portforward.go:234] lost connection to pod E0810 07:04:08.188870 1 portforward.go:406] an error occurred forwarding 32839 -> 8081: error forwarding port 8081 to pod 0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435, uid : failed to execute portforward in network namespace "/var/run/netns/cni-29540245-853e-246f-ed7f-0443a91d3642": failed to connect to localhost:8081 inside namespace "0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435", IPv4: dial tcp4 127.0.0.1:8081: connect: connection refused IPv6 dial tcp6 [::1]:8081: connect: connection refused E0810 07:04:08.189418 1 portforward.go:234] lost connection to pod 1.6601150482063682e+09 ERROR Reconciler error {"controller": "emqxbroker", "controllerGroup": "apps.emqx.io", "controllerKind": "EmqxBroker", "emqxBroker": {"name":"emqx","namespace":"emqx"}, "namespace": "emqx", "name": "emqx", "reconcileID": "bbf92a96-d65e-4792-a130-bc0d0f594557", "error": "Operation cannot be fulfilled on emqxbrokers.apps.emqx.io \"emqx\": the object has been modified; please apply your changes to the latest version and try again"} sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.2/pkg/internal/controller/controller.go:273 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2 /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.2/pkg/internal/controller/controller.go:234 E0810 07:04:08.443142 1 portforward.go:406] an error occurred forwarding 35395 -> 8081: error forwarding port 8081 to pod 0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435, uid : failed to execute portforward in network namespace "/var/run/netns/cni-29540245-853e-246f-ed7f-0443a91d3642": failed to connect to localhost:8081 inside namespace "0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435", IPv4: dial tcp4 127.0.0.1:8081: connect: connection refused IPv6 dial tcp6 [::1]:8081: connect: connection refused E0810 07:04:08.907719 1 portforward.go:406] an error occurred forwarding 33055 -> 8081: error forwarding port 8081 to pod 0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435, uid : failed to execute portforward in network namespace "/var/run/netns/cni-29540245-853e-246f-ed7f-0443a91d3642": failed to connect to localhost:8081 inside namespace "0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435", IPv4: dial tcp4 127.0.0.1:8081: connect: connection refused IPv6 dial tcp6 [::1]:8081: connect: connection refused E0810 07:04:08.907842 1 portforward.go:234] lost connection to pod E0810 07:04:09.192174 1 portforward.go:406] an error occurred forwarding 34689 -> 8081: error forwarding port 8081 to pod 0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435, uid : failed to execute portforward in network namespace "/var/run/netns/cni-29540245-853e-246f-ed7f-0443a91d3642": failed to connect to localhost:8081 inside namespace "0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435", IPv4: dial tcp4 127.0.0.1:8081: connect: connection refused IPv6 dial tcp6 [::1]:8081: connect: connection refused E0810 07:04:09.192605 1 portforward.go:234] lost connection to pod E0810 07:04:09.616450 1 portforward.go:406] an error occurred forwarding 42301 -> 8081: error forwarding port 8081 to pod 0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435, uid : failed to execute portforward in network namespace "/var/run/netns/cni-29540245-853e-246f-ed7f-0443a91d3642": failed to connect to localhost:8081 inside namespace "0ea370e97c58078f78a2ac82dd6cac94e03da6368ac639ad25df3f72e04af435", IPv4: dial tcp4 127.0.0.1:8081: connect: connection refused IPv6 dial tcp6 [::1]:8081: connect: connection refused 1.6601150500930722e+09 ERROR Reconciler error {"controller": "emqxbroker", "controllerGroup": "apps.emqx.io", "controllerKind": "EmqxBroker", "emqxBroker": {"name":"emqx","namespace":"emqx"}, "namespace": "emqx", "name": "emqx", "reconcileID": "371cf80b-1e05-4126-ab31-639b04c5d478", "error": "Operation cannot be fulfilled on emqxbrokers.apps.emqx.io \"emqx\": the object has been modified; please apply your changes to the latest version and try again"} sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.2/pkg/internal/controller/controller.go:273 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2 /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.2/pkg/internal/controller/controller.go:234 1.6601150507833595e+09 ERROR Reconciler error {"controller": "emqxbroker", "controllerGroup": "apps.emqx.io", "controllerKind": "EmqxBroker", "emqxBroker": {"name":"emqx","namespace":"emqx"}, "namespace": "emqx", "name": "emqx", "reconcileID": "2c813143-c2f5-4398-b6b7-bf3e92bd4350", "error": "Operation cannot be fulfilled on emqxbrokers.apps.emqx.io \"emqx\": the object has been modified; please apply your changes to the latest version and try again"} sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.2/pkg/internal/controller/controller.go:273 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2 /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.2/pkg/internal/controller/controller.go:234

Logs of the first node:
kubectl -n emqx logs emqx-0 -c emqx

hostname: emqx-0: Host not found
Starting emqx on node emqx@emqx-0.emqx-headless.emqx.svc.cluster.local
Start mqtt:tcp:internal listener on 127.0.0.1:11883 successfully.
Start mqtt:tcp:external listener on 0.0.0.0:1883 successfully.
Start mqtt:ws:external listener on 0.0.0.0:8083 successfully.
Start mqtt:ssl:external listener on 0.0.0.0:8883 successfully.
Start mqtt:wss:external listener on 0.0.0.0:8084 successfully.
Start http:management listener on 8081 successfully.
2022-08-10T07:04:09.807451+00:00 [warning] [Dashboard] Using default password for dashboard 'admin' user. Please use './bin/emqx_ctl admins' command to change it. NOTE: the default password in config file is only used to initialise the database record, changing the config file after database is initialised has no effect.
Start http:dashboard listener on 18083 successfully.
EMQ X Broker 4.4.6 is running now!
2022-08-10T07:04:11.816562+00:00 [warning] Ekka(AutoCluster): discovered nodes outside cluster: ['emqx@emqx-1.emqx-headless.emqx.svc.cluster.local']
2022-08-10T07:04:11.816736+00:00 [warning] Ekka(AutoCluster): discovery did not succeed; retrying in 5000 ms
2022-08-10T07:04:17.569069+00:00 [warning] Ekka(AutoCluster): discovered nodes outside cluster: ['emqx@emqx-1.emqx-headless.emqx.svc.cluster.local','emqx@emqx-2.emqx-headless.emqx.svc.cluster.local']
2022-08-10T07:04:17.569265+00:00 [warning] Ekka(AutoCluster): discovery did not succeed; retrying in 5000 ms
2022-08-10T07:04:25.334693+00:00 [warning] Ekka(AutoCluster): discovered nodes outside cluster: ['emqx@emqx-1.emqx-headless.emqx.svc.cluster.local','emqx@emqx-2.emqx-headless.emqx.svc.cluster.local']
2022-08-10T07:04:25.334864+00:00 [warning] Ekka(AutoCluster): discovery did not succeed; retrying in 5000 ms
2022-08-10T07:04:32.698077+00:00 [warning] Ekka(AutoCluster): discovered nodes outside cluster: ['emqx@emqx-1.emqx-headless.emqx.svc.cluster.local','emqx@emqx-2.emqx-headless.emqx.svc.cluster.local']
2022-08-10T07:04:32.698264+00:00 [warning] Ekka(AutoCluster): discovery did not succeed; retrying in 5000 ms
2022-08-10T07:04:38.495689+00:00 [warning] Ekka(AutoCluster): discovered nodes outside cluster: ['emqx@emqx-1.emqx-headless.emqx.svc.cluster.local','emqx@emqx-2.emqx-headless.emqx.svc.cluster.local']
2022-08-10T07:04:38.495870+00:00 [warning] Ekka(AutoCluster): discovery did not succeed; retrying in 5000 ms

The logs just stay the same after that.

Is this the first deployment? Have you deployed emqx before and deleted it?

axkng commented

This is the first deployment of that broker.
But yes, I tried to deploy other ones before.

Could you please show logs for emqx-1 and emqx-2 ?

axkng commented

Sure thing.

kubectl -n exo-emqx logs emqx-1 -c emqx

hostname: emqx-1: Host not found
Starting emqx on node emqx@emqx-1.emqx-headless.emqx.svc.cluster.local
Start mqtt:tcp:internal listener on 127.0.0.1:11883 successfully.
Start mqtt:tcp:external listener on 0.0.0.0:1883 successfully.
Start mqtt:ws:external listener on 0.0.0.0:8083 successfully.
Start mqtt:ssl:external listener on 0.0.0.0:8883 successfully.
Start mqtt:wss:external listener on 0.0.0.0:8084 successfully.
Start http:management listener on 8081 successfully.
2022-08-10T07:04:11.429818+00:00 [warning] [Dashboard] Using default password for dashboard 'admin' user. Please use './bin/emqx_ctl admins' command to change it. NOTE: the default password in config file is only used to initialise the database record, changing the config file after database is initialised has no effect.
Start http:dashboard listener on 18083 successfully.
EMQ X Broker 4.4.6 is running now!
2022-08-10T07:04:12.467104+00:00 [warning] Ekka(AutoCluster): discovered nodes outside cluster: ['emqx@emqx-0.emqx-headless.emqx.svc.cluster.local']
2022-08-10T07:04:12.467272+00:00 [warning] Ekka(AutoCluster): discovery did not succeed; retrying in 5000 ms
2022-08-10T07:04:17.639297+00:00 [warning] Ekka(AutoCluster): discovered nodes outside cluster: ['emqx@emqx-0.emqx-headless.emqx.svc.cluster.local','emqx@emqx-2.emqx-headless.emqx.svc.cluster.local']
2022-08-10T07:04:17.639472+00:00 [warning] Ekka(AutoCluster): discovery did not succeed; retrying in 5000 ms
2022-08-10T07:04:24.940386+00:00 [warning] Ekka(AutoCluster): discovered nodes outside cluster: ['emqx@emqx-0.emqx-headless.emqx.svc.cluster.local','emqx@emqx-2.emqx-headless.emqx.svc.cluster.local']
2022-08-10T07:04:24.940561+00:00 [warning] Ekka(AutoCluster): discovery did not succeed; retrying in 5000 ms
2022-08-10T07:04:30.877727+00:00 [warning] Ekka(AutoCluster): discovered nodes outside cluster: ['emqx@emqx-0.emqx-headless.emqx.svc.cluster.local','emqx@emqx-2.emqx-headless.emqx.svc.cluster.local']
2022-08-10T07:04:30.877912+00:00 [warning] Ekka(AutoCluster): discovery did not succeed; retrying in 5000 ms
2022-08-10T07:04:38.386440+00:00 [warning] Ekka(AutoCluster): discovered nodes outside cluster: ['emqx@emqx-0.emqx-headless.emqx.svc.cluster.local','emqx@emqx-2.emqx-headless.emqx.svc.cluster.local']

kubectl -n emqx logs emqx-2 -c emqx

hostname: emqx-2: Host not found
Starting emqx on node emqx@emqx-2.emqx-headless.emqx.svc.cluster.local
Start mqtt:tcp:internal listener on 127.0.0.1:11883 successfully.
Start mqtt:tcp:external listener on 0.0.0.0:1883 successfully.
Start mqtt:ws:external listener on 0.0.0.0:8083 successfully.
Start mqtt:ssl:external listener on 0.0.0.0:8883 successfully.
Start mqtt:wss:external listener on 0.0.0.0:8084 successfully.
Start http:management listener on 8081 successfully.
2022-08-10T07:04:21.079133+00:00 [warning] [Dashboard] Using default password for dashboard 'admin' user. Please use './bin/emqx_ctl admins' command to change it. NOTE: the default password in config file is only used to initialise the database record, changing the config file after database is initialised has no effect.
Start http:dashboard listener on 18083 successfully.
EMQ X Broker 4.4.6 is running now!
2022-08-10T07:04:24.909316+00:00 [warning] Ekka(AutoCluster): discovered nodes outside cluster: ['emqx@emqx-0.emqx-headless.emqx.svc.cluster.local','emqx@emqx-1.emqx-headless.emqx.svc.cluster.local']
2022-08-10T07:04:24.909510+00:00 [warning] Ekka(AutoCluster): discovery did not succeed; retrying in 5000 ms
2022-08-10T07:04:32.263225+00:00 [warning] Ekka(AutoCluster): discovered nodes outside cluster: ['emqx@emqx-0.emqx-headless.emqx.svc.cluster.local','emqx@emqx-1.emqx-headless.emqx.svc.cluster.local']
2022-08-10T07:04:32.263384+00:00 [warning] Ekka(AutoCluster): discovery did not succeed; retrying in 5000 ms
2022-08-10T07:04:37.785043+00:00 [warning] Ekka(AutoCluster): discovered nodes outside cluster: ['emqx@emqx-0.emqx-headless.emqx.svc.cluster.local','emqx@emqx-1.emqx-headless.emqx.svc.cluster.local']
2022-08-10T07:04:37.785206+00:00 [warning] Ekka(AutoCluster): discovery did not succeed; retrying in 5000 ms
2022-08-10T07:04:43.694825+00:00 [warning] Ekka(AutoCluster): discovered nodes outside cluster: ['emqx@emqx-0.emqx-headless.emqx.svc.cluster.local','emqx@emqx-1.emqx-headless.emqx.svc.cluster.local']

Again, the logs just stay the same.

@qzhuyan Have any idea ?

After talked to @Rory-Z we think it relates to publishNotReadyAddresses flag in the k8s service.

@Rory-Z will release a fix for it.

@Furragen you could try to manually set publishNotReadyAddresses to true and delete all the pods to verify it or wait for the new release of emqx operator.

axkng commented

Hi @qzhuyan ,
I tested this, but sadly the error stays the same.

Hi @Furragen EMQX Operator 1.2.5 is released, please try again, and please let me know is it work

axkng commented

Hi @Rory-Z ,
thank your for the new release, but the error sadly was not fixed.

@Furragen Sounds frustrating, the EMQX pod log still the same ?

Hi, @Furragen Could you please check pod network ? running following command in EMQX pod

nslookup -type=srv $(headless service name).$(namespace).svc.cluster.local

you should got output like this

emqx-headless.default.svc.cluster.local service = 0 33 8081 emqx-0.emqx-headless.default.svc.cluster.local
emqx-headless.default.svc.cluster.local service = 0 33 8081 emqx-1.emqx-headless.default.svc.cluster.local
emqx-headless.default.svc.cluster.local service = 0 33 8081 emqx-2.emqx-headless.default.svc.cluster.local

and check network ping

nc -zv emqx-2.emqx-headless.default.svc.cluster.local 8081

and like this output is successfully

emqx-2.emqx-headless.default.svc.cluster.local (172.17.0.8:8081) open
axkng commented

So the lookup worked fine. My cluster uses IPv6 btw. Could that be a problem?

Network ping did not work.

Network ping did not work.

I think that is reason.

Could you please check if pinging another EMQX pod with IP in the EMQX pod works?

In statefulSet, pod should have stable network ID: https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/#stable-network-id, EMQX use this network ID discover each other, if network don't work, EMQX cluster will failed.

Because this is the k8s feature, so maybe need check AWS EKS

axkng commented

The direct way via the IP of the pod also did not work.
And I think I know why:
EMQX only listens on IPv4.

netstat -tulpen
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name    
tcp        0      0 127.0.0.1:11883         0.0.0.0:*               LISTEN      1/emqx
tcp        0      0 0.0.0.0:8081            0.0.0.0:*               LISTEN      1/emqx
tcp        0      0 0.0.0.0:4370            0.0.0.0:*               LISTEN      1/emqx
tcp        0      0 0.0.0.0:8883            0.0.0.0:*               LISTEN      1/emqx
tcp        0      0 0.0.0.0:8083            0.0.0.0:*               LISTEN      1/emqx
tcp        0      0 0.0.0.0:8084            0.0.0.0:*               LISTEN      1/emqx
tcp        0      0 0.0.0.0:5369            0.0.0.0:*               LISTEN      1/emqx
tcp        0      0 0.0.0.0:1883            0.0.0.0:*               LISTEN      1/emqx
tcp        0      0 0.0.0.0:18083           0.0.0.0:*               LISTEN      1/emqx

This was from inside the emqx-0 pod.
Like I said, the cluster uses IPv6, so this can not work.
Is there any way to make EMQX listen to IPv6?

@Furragen You can deploy EMQX like this:

apiVersion: apps.emqx.io/v1beta3
kind: EmqxBroker
metadata:
  name: emqx
spec:
  emqxTemplate:
    image: emqx/emqx:4.4.6
    config:
      listener.tcp.external: :::1883
      management.listener.http: :::8081
      dashboard.listener.http: :::18083

Sorry I don't have IPV6 cluster, so need your try this

axkng commented

Absolutely no problem.
I redeployed the broker and we got a little further.
The logs and the error stays the same:

emqx_ctl cluster_status
Node 'emqx@emqx-0.emqx-headless.exo-emqx.svc.cluster.local' not responding to pings.
/opt/emqx/bin/emqx: line 46: die: command not found

But: doing the ping by hand with ncnow succeeds.
So the connection works, but something is still broken.
Could there be more listeners that I need to switch to v6?

Cooool, You can change all the listener you care about to IPV6 format, see https://www.emqx.io/docs/en/v4.4/configuration/configuration.html#listener-tcp-external

Could you please run following command in EMQX pod:

emqx eval "net_adm:ping('emqx@emqx-0.emqx-headless.default.svc.cluster.local')."

The emqx@emqx-0.emqx-headless.default.svc.cluster.local is other EMQX node name

axkng commented

So, I tried this and the command you mentioned did not succeed.
The error is:

Node 'emqx@emqx-0.emqx-headless.emqx.svc.cluster.local' not responding to pings.
/usr/local/bin/emqx: line 46: die: command not found

This error always appears when running the emqx-command.

Also, I have tested around with setting listeners to IPv6:

apiVersion: apps.emqx.io/v1beta3
kind: EmqxBroker
metadata:
  name: emqx
  labels:
    app: emqx
    environment: dev
spec:
  persistent:
    accessModes:
      - ReadWriteOnce
    storageClassName: ebs-gp3
    resources:
      requests:
        storage: 1Gi
  emqxTemplate:
    image: emqx/emqx:4.4.6
    config:
      listener.tcp.external: :::1883
      listener.ssl.external: :::8883
      management.listener.http: :::8081
      dashboard.listener.http: :::18083
      listener.tcp.internal: :::11883
      listener.ws.external: :::8083
      listener.wss.external: :::8084

The pods start, but the dashboard-plugin seems to be unhappy:

2022-08-11T09:45:39.371399+00:00 [alert] [Plugins] Plugin emqx_dashboard load failed with {function_clause,[{emqx_plugins,apply_configs,[{error,transform_datatypes,{errorlist,[{error,{transform_type,"dashboard.listener.http"}},{error,{conversion,{":::18083",integer}}}]}}],[{file,"emqx_plugins.erl"},{line,302}]},{emqx_plugins,load_plugin,2,[{file,"emqx_plugins.erl"},{line,325}]},{lists,foreach,2,[{file,"lists.erl"},{line,1342}]},{emqx_app,start,2,[{file,"emqx_app.erl"},{line,50}]},{application_master,start_it_old,4,[{file,"application_master.erl"},{line,293}]}]}

Looks like it cannot convert the v6-notation.

On top of that I found three other settings that would need tuning I think.
The first one is cluster.proto_dist.
The docs mention that I could set it to inet6_tcp to use IPv6. But when I do that, the pods do not start anymore.

And then there are cluster.mcast.iface and rpc.tcp_server_ip. These two settings do not seem to support IPv6 according to the docs. Is that correct?

The listeners I just mentioned and the ones in my manifest seem to be the ones EMQX starts by default, so I did not look further.

Do you know of anyone using EMQX with IPv6?

@qzhuyan @zmstone Need help

Node 'emqx@emqx-0.emqx-headless.emqx.svc.cluster.local' not responding to pings.
/usr/local/bin/emqx: line 46: die: command not found

means the peer node that we are pinging is unreachable.

axkng commented

I ran the command from the emqx-0 pod, trying to query emqx-1.
Does that not mean emqx-0 has a problem?

It's likely that EMQX's distribution and RPC library does not support ipv6 that well.
We'll investigate it.

axkng commented

Good to know, thank you.