CRASH REPORT with VerneMQ Helm Install
jensjohansen opened this issue ยท 5 comments
Crash Report:
15:23:11.431 [error] CRASH REPORT Process <0.844.0> with 0 neighbours crashed with reason: bad argument in call to erlang:binary_to_list(undefined) in ssl_config:file_error/2 line 221
15:23:11.431 [error] Supervisor {<0.842.0>,tls_dyn_connection_sup} had child receiver started with {ssl_gen_statem,start_link,undefined} at <0.844.0> exit with reason bad argument in call to erlang:binary_to_list(undefined) in ssl_config:file_error/2 line 221 in context child_terminated
15:23:11.431 [error] Supervisor {<0.449.0>,ranch_acceptors_sup} had child {acceptor,<0.449.0>,1} started with ranch_acceptor:start_link({{192,168,40,176},8443}, 1, {sslsocket,nil,{#Port<0.15>,{config,#{middlebox_comp_mode => true,padding_check => true,signature_algs => ...,...},...}}}, ranch_ssl, logger) at <0.711.0> exit with reason bad argument in call to erlang:binary_to_list(undefined) in ssl_config:file_error/2 line 221 in context child_terminated
Installation: VerneMQ Helm Chart version 1.8.0
cert-manager Certificate CRD (verified a valid Letsencrypt cert is stored in mqtt-link-labs-tls-secret):
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: mqtt-link-labs-tls-certificate
namespace: mqtt-publishing
spec:
secretName: mqtt-link-labs-tls-secret
dnsNames:
- "dev-mqtt.amz-link-labs.net"
subject:
organizations:
- "Link Labs"
organizationalUnits:
- "Airfinder Asset RTLS"
issuerRef:
name: letsencrypt-production
kind: ClusterIssuer
values.yaml
additionalEnv:
- name: DOCKER_VERNEMQ_ALLOW_ANONYMOUS
value: "off"
- name: DOCKER_VERNEMQ_LISTENER__SSL__CAFILE
value: /etc/ssl/vernemq/tls.crt
- name: DOCKER_VERNEMQ_LISTENER__SSL__CERTFILE
value: /etc/ssl/vernemq/tls.crt
- name: DOCKER_VERNEMQ_LISTENER__SSL__KEYFILE
value: /etc/ssl/vernemq/tls.key
- name: DOCKER_VERNEMQ_METADATA_PLUGIN
value: vmq_plumtree
- name: DOCKER_VERNEMQ_PERSISTENT_CLIENT_EXPIRATION
value: 1d
- name: DOCKER_VERNEMQ_ACCEPT_EULA
value: "yes"
- name: DOCKER_VERNEMQ_PLUGINS__VMQ_PASSWD
value: "off"
- name: DOCKER_VERNEMQ_PLUGINS__VMQ_ACL
value: "off"
- name: DOCKER_VERNEMQ_PLUGINS__VMQ_DIVERSITY
value: "on"
- name: DOCKER_VERNEMQ_VMQ_DIVERSITY__AUTH_MYSQL__ENABLED
value: "on"
- name: DOCKER_VERNEMQ_VMQ_DIVERSITY__MYSQL__HOST
value: <redacted>
- name: DOCKER_VERNEMQ_VMQ_DIVERSITY__MYSQL__PORT
value: "3306"
- name: DOCKER_VERNEMQ_VMQ_DIVERSITY__MYSQL__USER
value: vmq
- name: DOCKER_VERNEMQ_VMQ_DIVERSITY__MYSQL__PASSWORD
value: <redacted>
- name: DOCKER_VERNEMQ_VMQ_DIVERSITY__MYSQL__DATABASE
value: access
- name: DOCKER_VERNEMQ_VMQ_DIVERSITY__MYSQL__PASSWORD_HASH_METHOD
value: sha256
- name: DOCKER_VERNEMQ_LOG__CONSOLE
value: both
- name: DOCKER_VERNEMQ_LOG__CONSOLE__LEVEL
value: debug
- name: DOCKER_VERNEMQ_TOPIC_MAX_DEPTH
value: "20"
envFrom: []
extraVolumeMounts: []
extraVolumes: []
fullnameOverride: ""
image:
pullPolicy: IfNotPresent
repository: vernemq/vernemq
tag: latest
ingress:
annotations: {}
className: ""
enabled: false
hosts: []
labels: {}
paths:
- path: /
pathType: ImplementationSpecific
tls: []
nameOverride: ""
nodeSelector: {}
pdb:
enabled: false
minAvailable: 1
persistentVolume:
accessModes:
- ReadWriteOnce
annotations: {}
enabled: false
size: 50Gi
podAntiAffinity: soft
rbac:
create: true
serviceAccount:
create: true
replicaCount: 3
resources: {}
secretMounts:
- name: vernemq-certificate
path: /etc/ssl/vernemq
secretName: mqtt-link-labs-tls-secret
securityContext:
fsGroup: 10000
runAsGroup: 10000
runAsUser: 10000
service:
annotations:
alb.ingress.kubernetes.io/scheme: internet-facing
alb.ingress.kubernetes.io/target-type: ip
service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
service.beta.kubernetes.io/aws-load-balancer-type: nlb
api:
enabled: true
nodePort: 38888
port: 8888
enabled: true
labels: {}
loadBalancerSourceRanges:
- <redacted>
mqtt:
enabled: true
nodePort: 1883
port: 1883
mqtts:
enabled: true
nodePort: 8883
port: 8883
type: LoadBalancer
ws:
enabled: true
nodePort: 8080
port: 8080
wss:
enabled: true
nodePort: 8443
port: 8443
serviceMonitor:
create: true
labels: {}
statefulset:
annotations: {}
labels: {}
lifecycle: {}
livenessProbe:
failureThreshold: 3
initialDelaySeconds: 90
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
podAnnotations: {}
podLabels: {}
podManagementPolicy: OrderedReady
readinessProbe:
failureThreshold: 3
initialDelaySeconds: 90
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
terminationGracePeriodSeconds: 60
updateStrategy: RollingUpdate
tolerations: []
Expected Behavior:
MQTT Explorer can connect to, and browse topics in
- mqtt://dev.mqtt.amz-link-labs.net:1883
- mqtts://dev-mqtt.amz-link-labs.net:8883
- ws://dev-mqtt.amz-link-labs.net:8080
- wss//dev-mqtt.amz-link-labs.net:8443
API available on https://dev-mqtt.amz-link-labs.net:8888
Actual Behavior
- All three replicas endlessly repeating the above Crash Report.
- Connections from MQTT explorer timing out.
- Connection to API times out
@jensjohansen You're using the same file as certfile
and cafile
. I'm not 100% sure this is impossible but you might want to check if that's an issue.
๐ Thank you for supporting VerneMQ: https://github.com/sponsors/vernemq
๐ Using the binary VerneMQ packages commercially (.deb/.rpm/Docker) requires a paid subscription.
I will work on that. The HELM install instructions say that cert-manager inserts the ca sert into the tls cert, and I see that, but they also say the secret is supposed to have three keys internally (ca.crt, tls.cert and tls.key), but the current version of cert-manager seems to create a secret with all three items in one key.
Still an issue. Even removing these still causes the problem.
- name: DOCKER_VERNEMQ_LISTENER__SSL__CAFILE
value: /etc/ssl/vernemq/tls.crt
- name: DOCKER_VERNEMQ_LISTENER__SSL__CERTFILE
value: /etc/ssl/vernemq/tls.crt
- name: DOCKER_VERNEMQ_LISTENER__SSL__KEYFILE
value: /etc/ssl/vernemq/tls.key
In the Helm instructions, the documentation says that the cert-manager inserts the ca.crt in the secret along with the tls.crt and tls.crt, and shows the expected format of the secret. However, when you create the certificate using the Certificate CRD that used letsencrypt, the resulting secret has only two keys: tls.crt and tls.key. There are actually three certs in tls.key - the authorization chain up to letsencrypt-production. I have tried extracting the ca.crt out into a separate key following the pattern suggested for using existing keys, but this still gets exactly the same results:
15:01:53.704 [error] Supervisor {<0.2120.0>,tls_dyn_connection_sup} had child receiver started with {ssl_gen_statem,start_link,undefined} at <0.2122.0> exit with reason bad argument in call to erlang:binary_to_list(undefined) in ssl_config:file_error/2 line 221 in context child_terminated
15:01:53.705 [error] Supervisor {<0.401.0>,ranch_acceptors_sup} had child {acceptor,<0.401.0>,2} started with ranch_acceptor:start_link({{192,168,48,156},8883}, 2, {sslsocket,nil,{#Port<0.13>,{config,#{versions => [{3,3}],keyfile => <<>>,next_protocol_selector => ...,...},...}}}, ranch_ssl, logger) at <0.2014.0> exit with reason bad argument in call to erlang:binary_to_list(undefined) in ssl_config:file_error/2 line 221 in context child_terminated
15:01:54.005 [debug] session normally stopped
15:01:58.198 [debug] session normally stopped
15:01:59.768 [error] CRASH REPORT Process <0.2135.0> with 0 neighbours crashed with reason: bad argument in call to erlang:binary_to_list(undefined) in ssl_config:file_error/2 line 221
15:01:59.768 [error] Supervisor {<0.2133.0>,tls_dyn_connection_sup} had child receiver started with {ssl_gen_statem,start_link,undefined} at <0.2135.0> exit with reason bad argument in call to erlang:binary_to_list(undefined) in ssl_config:file_error/2 line 221 in context child_terminated
15:01:59.768 [error] Supervisor {<0.401.0>,ranch_acceptors_sup} had child {acceptor,<0.401.0>,3} started with ranch_acceptor:start_link({{192,168,48,156},8883}, 3, {sslsocket,nil,{#Port<0.13>,{config,#{versions => [{3,3}],keyfile => <<>>,next_protocol_selector => ...,...},...}}}, ranch_ssl, logger) at <0.2027.0> exit with reason bad argument in call to erlang:binary_to_list(undefined) in ssl_config:file_error/2 line 221 in context child_terminated
15:02:01.740 [error] CRASH REPORT Process <0.2140.0> with 0 neighbours crashed with reason: bad argument in call to erlang:binary_to_list(undefined) in ssl_config:file_error/2 line 221
15:02:01.741 [error] Supervisor {<0.2138.0>,tls_dyn_connection_sup} had child receiver started with {ssl_gen_statem,start_link,undefined} at <0.2140.0> exit with reason bad argument in call to erlang:binary_to_list(undefined) in ssl_config:file_error/2 line 221 in context child_terminated
15:02:01.741 [error] Supervisor {<0.401.0>,ranch_acceptors_sup} had child {acceptor,<0.401.0>,4} started with ranch_acceptor:start_link({{192,168,48,156},8883}, 4, {sslsocket,nil,{#Port<0.13>,{config,#{versions => [{3,3}],keyfile => <<>>,next_protocol_selector => ...,...},...}}}, ranch_ssl, logger) at <0.2038.0> exit with reason bad argument in call to erlang:binary_to_list(undefined) in ssl_config:file_error/2 line 221 in context child_terminated
15:02:02.324 [debug] started plumtree_metadata_manager exchange with 'VerneMQ@vernemq-1.vernemq-headless.mqtt-publishing.svc.cluster.local' (<0.2143.0>)
15:02:02.327 [debug] completed metadata exchange with 'VerneMQ@vernemq-1.vernemq-headless.mqtt-publishing.svc.cluster.local'. nothing repaired
15:02:02.329 [debug] 0ms mailbox traversal, schedule next lazy broadcast in 10000ms, the min interval is 10000ms
15:02:02.478 [debug] session normally stopped
15:02:05.596 [error] CRASH REPORT Process <0.2154.0> with 0 neighbours crashed with reason: bad argument in call to erlang:binary_to_list(undefined) in ssl_config:file_error/2 line 221
15:02:05.596 [error] Supervisor {<0.2152.0>,tls_dyn_connection_sup} had child receiver started with {ssl_gen_statem,start_link,undefined} at <0.2154.0> exit with reason bad argument in call to erlang:binary_to_list(undefined) in ssl_config:file_error/2 line 221 in context child_terminated
15:02:05.597 [error] Supervisor {<0.401.0>,ranch_acceptors_sup} had child {acceptor,<0.401.0>,5} started with ranch_acceptor:start_link({{192,168,48,156},8883}, 5, {sslsocket,nil,{#Port<0.13>,{config,#{versions => [{3,3}],keyfile => <<>>,next_protocol_selector => ...,...},...}}}, ranch_ssl, logger) at <0.2051.0> exit with reason bad argument in call to erlang:binary_to_list(undefined) in ssl_config:file_error/2 line 221 in context child_terminated
15:02:07.093 [debug] session normally stopped
15:02:08.320 [debug] session normally stopped
15:02:09.830 [error] CRASH REPORT Process <0.2166.0> with 0 neighbours crashed with reason: bad argument in call to erlang:binary_to_list(undefined) in ssl_config:file_error/2 line 221
15:02:09.830 [error] Supervisor {<0.2164.0>,tls_dyn_connection_sup} had child receiver started with {ssl_gen_statem,start_link,undefined} at <0.2166.0> exit with reason bad argument in call to erlang:binary_to_list(undefined) in ssl_config:file_error/2 line 221 in context child_terminated
15:02:09.830 [error] Supervisor {<0.401.0>,ranch_acceptors_sup} had child {acceptor,<0.401.0>,6} started with ranch_acceptor:start_link({{192,168,48,156},8883}, 6, {sslsocket,nil,{#Port<0.13>,{config,#{versions => [{3,3}],keyfile => <<>>,next_protocol_selector => ...,...},...}}}, ranch_ssl, logger) at <0.2056.0> exit with reason bad argument in call to erlang:binary_to_list(undefined) in ssl_config:file_error/2 line 221 in context child_terminated
15:02:11.923 [error] CRASH REPORT Process <0.2171.0> with 0 neighbours crashed with reason: bad argument in call to erlang:binary_to_list(undefined) in ssl_config:file_error/2 line 221
15:02:11.923 [error] Supervisor {<0.2169.0>,tls_dyn_connection_sup} had child receiver started with {ssl_gen_statem,start_link,undefined} at <0.2171.0> exit with reason bad argument in call to erlang:binary_to_list(undefined) in ssl_config:file_error/2 line 221 in context child_terminated
15:02:11.924 [error] Supervisor {<0.401.0>,ranch_acceptors_sup} had child {acceptor,<0.401.0>,7} started with ranch_acceptor:start_link({{192,168,48,156},8883}, 7, {sslsocket,nil,{#Port<0.13>,{config,#{versions => [{3,3}],keyfile => <<>>,next_protocol_selector => ...,...},...}}}, ranch_ssl, logger) at <0.2069.0> exit with reason bad argument in call to erlang:binary_to_list(undefined) in ssl_config:file_error/2 line 221 in context child_terminated
15:02:12.325 [debug] started plumtree_metadata_manager exchange with 'VerneMQ@vernemq-0.vernemq-headless.mqtt-publishing.svc.cluster.local' (<0.2173.0>)
15:02:12.326 [debug] completed metadata exchange with 'VerneMQ@vernemq-0.vernemq-headless.mqtt-publishing.svc.cluster.local'. nothing repaired
15:02:12.330 [debug] 0ms mailbox traversal, schedule next lazy broadcast in 10000ms, the min interval is 10000ms
15:02:13.293 [debug] session normally stopped
15:02:16.874 [error] CRASH REPORT Process <0.2184.0> with 0 neighbours crashed with reason: bad argument in call to erlang:binary_to_list(undefined) in ssl_config:file_error/2 line 221
15:02:16.874 [error] Supervisor {<0.2182.0>,tls_dyn_connection_sup} had child receiver started with {ssl_gen_statem,start_link,undefined} at <0.2184.0> exit with reason bad argument in call to erlang:binary_to_list(undefined) in ssl_config:file_error/2 line 221 in context child_terminated
15:02:16.875 [error] Supervisor {<0.401.0>,ranch_acceptors_sup} had child {acceptor,<0.401.0>,8} started with ranch_acceptor:start_link({{192,168,48,156},8883}, 8, {sslsocket,nil,{#Port<0.13>,{config,#{versions => [{3,3}],keyfile => <<>>,next_protocol_selector => ...,...},...}}}, ranch_ssl, logger) at <0.2081.0> exit with reason bad argument in call to erlang:binary_to_list(undefined) in ssl_config:file_error/2 line 221 in context child_terminated
15:02:18.161 [debug] session normally stopped
15:02:18.443 [debug] session normally stopped
Even a hint as to what the line 221 error means would help. I am guessing I will have to dig into the code next to make progress.
Solution:
Contrary to the documentation, cert-manager doesn't add the CA cert to the certificate file.
To stop the CRASH Reports using the cert-manager 1.9.1 and later:
- Create an account LetsEncrypt based on an email used by your devops team at https://community.letsencrypt.org/
- Create a ClusterIssuer to use this account
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-production
spec:
acme:
email: <your devops email here>
preferredChain: ''
privateKeySecretRef:
name: letsencrypt-production
server: https://acme-v02.api.letsencrypt.org/directory
solvers:
- http01:
ingress:
class: nginx-external #change this to your internet-facing ingress controller's ingressClassName
using your DevOps email you set up as your LetsEncrypt account, and the name of your internet-accessible ingress controller.
- Create a certificate secret for your helm chart to use:
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: mqtt-tls-certificate
namespace: mqtt-publishing # In case you are using a namespace for vernemq other than default
spec:
dnsNames:
- dev-mqtt.amz-link-labs.net
issuerRef:
group: cert-manager.io
kind: ClusterIssuer
name: letsencrypt-production
secretName: mqtt-tls-secret # this is the secret where the tls.crrt and tls.key will be stored
usages:
- digital signature
- key encipherment
- Install VerneMQ (I used Helm Chart version 1.8.0) using something like these values (values.yaml)
additionalEnv:
# note, specifying the ca.crt causes the Crash Report, so leave it out for letsencrypt certs
# - name: DOCKER_VERNEMQ_LISTENER__SSL__CAFILE
# value: /etc/ssl/vernemq/tls.crt
- name: DOCKER_VERNEMQ_LISTENER__SSL__CERTFILE
value: /etc/ssl/vernemq/tls.crt
- name: DOCKER_VERNEMQ_LISTENER__SSL__KEYFILE
value: /etc/ssl/vernemq/tls.key
- name: DOCKER_VERNEMQ_ALLOW_REGISTER_DURING_NETSPLIT
value: "on"
- name: DOCKER_VERNEMQ_ALLOW_PUBLISH_DURING_NETSPLIT
value: "on"
- name: DOCKER_VERNEMQ_ALLOW_SUBSCRIBE_DURING_NETSPLIT
value: "on"
- name: DOCKER_VERNEMQ_ALLOW_UNSUBSCRIBE_DURING_NETSPLIT
value: "on"
- name: DOCKER_VERNEMQ_ALLOW_ANONYMOUS
value: "off"
- name: DOCKER_VERNEMQ_METADATA_PLUGIN
value: vmq_plumtree
- name: DOCKER_VERNEMQ_PERSISTENT_CLIENT_EXPIRATION
value: 1d
- name: DOCKER_VERNEMQ_ACCEPT_EULA
value: "yes"
- name: DOCKER_VERNEMQ_PLUGINS__VMQ_PASSWD
value: "off"
- name: DOCKER_VERNEMQ_PLUGINS__VMQ_ACL
value: "off"
- name: DOCKER_VERNEMQ_PLUGINS__VMQ_DIVERSITY
value: "on"
- name: DOCKER_VERNEMQ_VMQ_DIVERSITY__AUTH_MYSQL__ENABLED
value: "on"
- name: DOCKER_VERNEMQ_VMQ_DIVERSITY__MYSQL__HOST
value: <redacted>
- name: DOCKER_VERNEMQ_VMQ_DIVERSITY__MYSQL__PORT
value: "3306"
- name: DOCKER_VERNEMQ_VMQ_DIVERSITY__MYSQL__USER
value: vmq
- name: DOCKER_VERNEMQ_VMQ_DIVERSITY__MYSQL__PASSWORD
value: <redacted>
- name: DOCKER_VERNEMQ_VMQ_DIVERSITY__MYSQL__DATABASE
value: access
- name: DOCKER_VERNEMQ_VMQ_DIVERSITY__MYSQL__PASSWORD_HASH_METHOD
value: sha256
- name: DOCKER_VERNEMQ_LOG__CONSOLE
value: both
- name: DOCKER_VERNEMQ_LOG__CONSOLE__LEVEL
value: debug
- name: DOCKER_VERNEMQ_TOPIC_MAX_DEPTH
value: "20"
envFrom: []
extraVolumeMounts: []
extraVolumes: []
fullnameOverride: ""
image:
pullPolicy: IfNotPresent
repository: vernemq/vernemq
tag: 1.12.6.1-alpine
ingress:
annotations:
app.kubernetes.io/name: vernemq
# acme.cert-manager is the cert-manager plugin that handles letsencrypt negotiation
# tell Acme what ingress controller is internet-facing certmanager can automate the Letsencrypt challenges
acme.cert-manager.io/http01-ingress-class: nginx-external
acme.cert-manager.io/http01-edit-in-place: "true" # reuse existing valid certs
alb.ingress.kubernetes.io/scheme: internet-facing
cert-manager.io/cluster-issuer: letsencrypt-production
certmanager.k8s.io/acme-challenge-type: http01 #use HTTP rather than DNS challenges
className: nginx-external #make mqtt available on an internet-facing ingress controller
enabled: true
hosts:
- dev-mqtt.amz-link-labs.net
labels: {}
paths:
- path: /
pathType: ImplementationSpecific
tls:
- hosts:
- dev-mqtt.amz-link-labs.net
secretName: mqtt-tls-secret
nameOverride: ""
nodeSelector: {}
pdb:
enabled: false
minAvailable: 1
persistentVolume:
accessModes:
- ReadWriteOnce
annotations: {}
enabled: true
size: 50Gi
podAntiAffinity: soft
rbac:
create: true
serviceAccount:
create: true
replicaCount: 3
resources: {}
secretMounts:
- name: vernemq-certificates
path: /etc/ssl/vernemq
secretName: mqtt-tls-secret # the secret created in step
securityContext:
fsGroup: 10000
runAsGroup: 10000
runAsUser: 10000
service:
annotations: {}
api:
enabled: true
nodePort: 38888
port: 8888
enabled: true
labels: {}
mqtt:
enabled: true
nodePort: 1883
port: 1883
mqtts:
enabled: true
nodePort: 8883
port: 8883
type: ClusterIP
ws:
enabled: true
nodePort: 8080
port: 8080
wss:
enabled: true
nodePort: 8443
port: 8443
serviceMonitor:
create: true
labels: {}
statefulset:
annotations: {}
labels: {}
lifecycle: {}
livenessProbe:
failureThreshold: 3
initialDelaySeconds: 90
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
podAnnotations: {}
podLabels: {}
podManagementPolicy: OrderedReady
readinessProbe:
failureThreshold: 3
initialDelaySeconds: 90
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
terminationGracePeriodSeconds: 60
updateStrategy: RollingUpdate
tolerations: []
@jensjohansen Thank you for documenting your solution! so is there anything wrong or incomplete in our documentation still?
๐ Thank you for supporting VerneMQ: https://github.com/sponsors/vernemq
๐ Using the binary VerneMQ packages commercially (.deb/.rpm/Docker) requires a paid subscription.