Cannot upgrade microk8s v1.27.16 to microk8s v1.28.12
davgia opened this issue · 2 comments
First of all thanks for the awesome product. I'm using it extensively.
I have a microk8s instance (v1.27.16) on an Ubuntu vm.
This is the OS information:
Static hostname: machiavelli
Icon name: computer-vm
Chassis: vm
Machine ID: fb84d53edc164b438084094e4c1dd23c
Boot ID: 201de1f84d7c401695fdc4c72a58dfb3
Virtualization: microsoft
Operating System: Ubuntu 20.04.6 LTS
Kernel: Linux 5.15.0-107-generic
Architecture: x86-64
I want to upgrade to a newer Kubernetes version (at least 1.29). I am upgrading step by step each minor kubernetes release. I just cordon and drain the node then use snap refresh microk8s --channel 1.xx/stable
. So far I was able to update from 1.26 to 1.27. Starting from 1.28 I have a strange behaviour of the node. The node is Ready
but it has an incorrect version (I should see 1.28.12 instead of 1.27.16) and pod are not scheduled in it. I see that the node has NodeNotSchedulable
in its events but I don't understand what's the problem. Can you help me pinpoint the error and fix it?
Here are some command outputs:
# kubectl get nodes
machiavelli Ready <none> 2y283d v1.27.16
# kubectl describe node machiavelli
Name: machiavelli
Roles: <none>
Annotations: {"":"machiavelli","":"machiavelli"} 0 192.168.***.***/24 true
CreationTimestamp: Mon, 15 Nov 2021 09:47:16 +0100
Taints: <none>
Unschedulable: false
HolderIdentity: machiavelli
AcquireTime: <unset>
RenewTime: Sat, 24 Aug 2024 13:42:40 +0200
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
NetworkUnavailable False Sat, 24 Aug 2024 13:32:14 +0200 Sat, 24 Aug 2024 13:32:14 +0200 CalicoIsUp Calico is running on this node
MemoryPressure False Sat, 24 Aug 2024 13:41:18 +0200 Mon, 27 Dec 2021 11:54:43 +0100 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Sat, 24 Aug 2024 13:41:18 +0200 Mon, 27 Dec 2021 11:54:43 +0100 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Sat, 24 Aug 2024 13:41:18 +0200 Mon, 27 Dec 2021 11:54:43 +0100 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Sat, 24 Aug 2024 13:41:18 +0200 Sat, 24 Aug 2024 13:31:03 +0200 KubeletReady kubelet is posting ready status. AppArmor enabled
Hostname: machiavelli
cpu: 8
ephemeral-storage: 526878168Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 32059776Ki
pods: 110
cpu: 8
ephemeral-storage: 525829592Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 31957376Ki
pods: 110
System Info:
Machine ID: fb84d53edc164b438084094e4c1dd23c
System UUID: 52337ed6-99e0-334c-8e1b-3a405b4bf026
Boot ID: 3094f23e-6484-445e-bed9-bfb8c02a4393
Kernel Version: 5.4.0-90-generic
OS Image: Ubuntu 20.04.6 LTS
Operating System: linux
Architecture: amd64
Container Runtime Version: containerd://1.6.28
Kubelet Version: v1.27.16
Kube-Proxy Version: v1.27.16
Non-terminated Pods: (7 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits Age
--------- ---- ------------ ---------- --------------- ------------- ---
kube-system calico-node-nsx6s 250m (3%) 0 (0%) 0 (0%) 0 (0%) 3h8m
metallb-system speaker-5k4hr 0 (0%) 0 (0%) 0 (0%) 0 (0%) 86d
monitoring node-exporter-2cfzr 0 (0%) 0 (0%) 0 (0%) 0 (0%) 185d
monitoring promtail-sqvqm 20m (0%) 0 (0%) 64Mi (0%) 128Mi (0%) 291d
openebs openebs-cstor-csi-node-f56pn 0 (0%) 0 (0%) 0 (0%) 0 (0%) 2y283d
openebs openebs-jiva-csi-node-jb9zf 0 (0%) 0 (0%) 0 (0%) 0 (0%) 185d
openebs openebs-ndm-jwbxf 0 (0%) 0 (0%) 0 (0%) 0 (0%) 2y283d
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 270m (3%) 0 (0%)
memory 64Mi (0%) 128Mi (0%)
ephemeral-storage 0 (0%) 0 (0%)
hugepages-1Gi 0 (0%) 0 (0%)
hugepages-2Mi 0 (0%) 0 (0%)
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Starting 3h9m kube-proxy
Normal NodeNotSchedulable 3h12m kubelet Node machiavelli status is now: NodeNotSchedulable
Warning InvalidDiskCapacity 3h9m kubelet invalid capacity 0 on image filesystem
Normal Starting 3h9m kubelet Starting kubelet.
Normal NodeHasNoDiskPressure 3h9m kubelet Node machiavelli status is now: NodeHasNoDiskPressure
Normal NodeHasSufficientMemory 3h9m kubelet Node machiavelli status is now: NodeHasSufficientMemory
Normal NodeHasSufficientPID 3h9m kubelet Node machiavelli status is now: NodeHasSufficientPID
Normal NodeNotReady 3h9m kubelet Node machiavelli status is now: NodeNotReady
Normal NodeAllocatableEnforced 3h9m kubelet Updated Node Allocatable limit across pods
Normal NodeReady 3h8m kubelet Node machiavelli status is now: NodeReady
Normal NodeSchedulable 3h8m kubelet Node machiavelli status is now: NodeSchedulable
Normal RegisteredNode 3h8m node-controller Node machiavelli event: Registered Node machiavelli in Controller
Normal NodeNotSchedulable 179m (x2 over 3h9m) kubelet Node machiavelli status is now: NodeNotSchedulable
What Should Happen Instead?
I should be able to upgrade microk8s instance without incurring in strange problems.
Reproduction Steps
I have this problem in a VM, while I have successfully updated microk8s in another 2 VMs. They are similarly configured, there shouldn't be much difference.
Introspection Report
# microk8s inspect
Inspecting system
Inspecting Certificates
Inspecting services
Service snap.microk8s.daemon-cluster-agent is running
Service snap.microk8s.daemon-containerd is running
Service snap.microk8s.daemon-kubelite is running
Service snap.microk8s.daemon-k8s-dqlite is running
Service snap.microk8s.daemon-apiserver-kicker is running
Copy service arguments to the final report tarball
Inspecting AppArmor configuration
Gathering system information
Copy processes list to the final report tarball
Copy disk usage information to the final report tarball
Copy memory usage information to the final report tarball
Copy server uptime to the final report tarball
Copy openSSL information to the final report tarball
Copy snap list to the final report tarball
Copy VM name (or none) to the final report tarball
Copy current linux distribution to the final report tarball
Copy asnycio usage and limits to the final report tarball
Copy inotify max_user_instances and max_user_watches to the final report tarball
Copy network configuration to the final report tarball
Inspecting kubernetes cluster
Inspect kubernetes cluster
Inspecting dqlite
Inspect dqlite
Building the report tarball
Report tarball is at /var/snap/microk8s/7018/inspection-report-20240824_163332.tar.gz
Can you suggest a fix?
I didn't understand the problem so I do not have a fix to suggest.
Are you interested in contributing with a fix?
Hi @davgia the error I am seeing in the logs (journalctl -fu snap.microk8s.daemon-kubelite
) is that
Aug 24 16:33:14 machiavelli microk8s.daemon-kubelite[262289]: Error: failed to set feature gates from initial flags-based config: unrecognized feature gate: DevicePlugins
Aug 24 16:33:14 machiavelli microk8s.daemon-kubelite[262289]: F0824 16:33:14.674155 262289 daemon.go:57] Kubelet exited failed to set feature gates from initial flags-based config: unrecognized feature gate: DevicePlugins
This is casing the services to enter a crashloop.
Hi @ktsakalozos, thank you so much. I completely missed it. I'll update the kubelet config and let you know once I have successfully updated microk8s. Thanks!