canonical/microk8s

dqlite not listening on socket after update to 1.31.3

gsnsw-felixs opened this issue · 4 comments

Summary

A 3 node cluster has failed after auto-update to 1.31.3, the dqlite service starts but is not listening on /var/snap/microk8s/7449/var/kubernetes/backend/kine.sock:12379

What Should Happen Instead?

dqlite service should start correctly or throw an error

Reproduction Steps

All 3 nodes in the cluster have this same issue. Another non-HA node seemed to update OK though.

Introspection Report

Sorry, can't post system details.

ubuntu@k8s-qa-001:~$ sudo systemctl status snap.microk8s.daemon-k8s-dqlite
● snap.microk8s.daemon-k8s-dqlite.service - Service for snap application microk8s.daemon-k8s-dqlite
Loaded: loaded (/etc/systemd/system/snap.microk8s.daemon-k8s-dqlite.service; enabled; vendor preset: enabled)
Active: active (running) since Fri 2024-12-06 08:28:40 AEDT; 1h 21min ago
Main PID: 767 (k8s-dqlite)
Tasks: 18 (limit: 37663)
Memory: 209.8M
CGroup: /system.slice/snap.microk8s.daemon-k8s-dqlite.service
└─767 /snap/microk8s/7449/bin/k8s-dqlite --storage-dir=/var/snap/microk8s/7449/var/kubernetes/backend/ --listen=unix:///var/snap/microk8s/7449/var/kubernetes/backend/kine.sock:12379

Dec 06 08:28:47 k8s-qa-001 microk8s.daemon-k8s-dqlite[767]: + '[' -e /var/snap/microk8s/7449/args/k8s-dqlite-env ']'
Dec 06 08:28:47 k8s-qa-001 microk8s.daemon-k8s-dqlite[767]: + . /var/snap/microk8s/7449/args/k8s-dqlite-env
Dec 06 08:28:47 k8s-qa-001 microk8s.daemon-k8s-dqlite[767]: + set +a
Dec 06 08:28:47 k8s-qa-001 microk8s.daemon-k8s-dqlite[2086]: ++ cat /var/snap/microk8s/7449/args/k8s-dqlite
Dec 06 08:28:47 k8s-qa-001 microk8s.daemon-k8s-dqlite[767]: + declare -a 'args=(--storage-dir=${SNAP_DATA}/var/kubernetes/backend/
Dec 06 08:28:47 k8s-qa-001 microk8s.daemon-k8s-dqlite[767]: --listen=unix://${SNAP_DATA}/var/kubernetes/backend/kine.sock:12379)'
Dec 06 08:28:47 k8s-qa-001 microk8s.daemon-k8s-dqlite[767]: + exec /snap/microk8s/7449/bin/k8s-dqlite --storage-dir=/var/snap/microk8s/7449/var/kubernetes/backend/ --listen=unix:///var/snap/microk8s/7449/var/kubernetes/backend/kine.sock:12379
Dec 06 08:28:56 k8s-qa-001 microk8s.daemon-k8s-dqlite[767]: time="2024-12-06T08:28:56+11:00" level=info msg="Configure dqlite failure domain" failure-domain=1
Dec 06 08:28:56 k8s-qa-001 microk8s.daemon-k8s-dqlite[767]: time="2024-12-06T08:28:56+11:00" level=info msg="Disable TLS ClientSessionCache"
Dec 06 08:28:56 k8s-qa-001 microk8s.daemon-k8s-dqlite[767]: time="2024-12-06T08:28:56+11:00" level=info msg="Enable TLS" min_tls_version=tls12
ubuntu@k8s-qa-001:$ netstat -a --unix | grep kine.sock
ubuntu@k8s-qa-001:
$

Can you suggest a fix?

Are you interested in contributing with a fix?

no

Thank you for reporting this @gsnsw-felixs. When was this deployment setup? Was it tracking the 1.31 release? Was the first version deployed 1.31.0 or something else?

Hello @gsnsw-felixs,
would you be able to tell us which snap revision you've updated from?

I'm pretty sure it had been set to:

tracking: 1.31/stable

So probably it had 1.31.2.

We removed and reinstalled the snap and got it running again on 1.31.3 BTW.

It had been around a long time, so could have started as 1.22.