Cannot upgrade microk8s v1.27.16 to microk8s v1.28.12

Question

Cannot upgrade microk8s v1.27.16 to microk8s v1.28.12

davgia opened this issue 4 months ago · 2 comments

Summary

First of all thanks for the awesome product. I'm using it extensively.
I have a microk8s instance (v1.27.16) on an Ubuntu vm.

This is the OS information:

   Static hostname: machiavelli
         Icon name: computer-vm
           Chassis: vm
        Machine ID: fb84d53edc164b438084094e4c1dd23c
           Boot ID: 201de1f84d7c401695fdc4c72a58dfb3
    Virtualization: microsoft
  Operating System: Ubuntu 20.04.6 LTS
            Kernel: Linux 5.15.0-107-generic
      Architecture: x86-64

I want to upgrade to a newer Kubernetes version (at least 1.29). I am upgrading step by step each minor kubernetes release. I just cordon and drain the node then use snap refresh microk8s --channel 1.xx/stable. So far I was able to update from 1.26 to 1.27. Starting from 1.28 I have a strange behaviour of the node. The node is Ready but it has an incorrect version (I should see 1.28.12 instead of 1.27.16) and pod are not scheduled in it. I see that the node has NodeNotSchedulable in its events but I don't understand what's the problem. Can you help me pinpoint the error and fix it?

Here are some command outputs:

# kubectl get nodes
NAME          STATUS   ROLES    AGE      VERSION
machiavelli   Ready    <none>   2y283d   v1.27.16

# kubectl describe node machiavelli
Name:               machiavelli
Roles:              <none>
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    environment=*********
                    ephemeral=false
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=machiavelli
                    kubernetes.io/os=linux
                    microk8s.io/cluster=true
                    reserved-for=system
                    topology.cstor.openebs.io/nodeName=machiavelli
                    topology.jiva.openebs.io/nodeName=machiavelli
Annotations:        csi.volume.kubernetes.io/nodeid: {"cstor.csi.openebs.io":"machiavelli","jiva.csi.openebs.io":"machiavelli"}
                    node.alpha.kubernetes.io/ttl: 0
                    projectcalico.org/IPv4Address: 192.168.***.***/24
                    projectcalico.org/IPv4VXLANTunnelAddr: 10.1.41.128
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Mon, 15 Nov 2021 09:47:16 +0100
Taints:             <none>
Unschedulable:      false
Lease:
  HolderIdentity:  machiavelli
  AcquireTime:     <unset>
  RenewTime:       Sat, 24 Aug 2024 13:42:40 +0200
Conditions:
  Type                 Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----                 ------  -----------------                 ------------------                ------                       -------
  NetworkUnavailable   False   Sat, 24 Aug 2024 13:32:14 +0200   Sat, 24 Aug 2024 13:32:14 +0200   CalicoIsUp                   Calico is running on this node
  MemoryPressure       False   Sat, 24 Aug 2024 13:41:18 +0200   Mon, 27 Dec 2021 11:54:43 +0100   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure         False   Sat, 24 Aug 2024 13:41:18 +0200   Mon, 27 Dec 2021 11:54:43 +0100   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure          False   Sat, 24 Aug 2024 13:41:18 +0200   Mon, 27 Dec 2021 11:54:43 +0100   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready                True    Sat, 24 Aug 2024 13:41:18 +0200   Sat, 24 Aug 2024 13:31:03 +0200   KubeletReady                 kubelet is posting ready status. AppArmor enabled
Addresses:
  InternalIP:  192.168.222.50
  Hostname:    machiavelli
Capacity:
  cpu:                8
  ephemeral-storage:  526878168Ki
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             32059776Ki
  pods:               110
Allocatable:
  cpu:                8
  ephemeral-storage:  525829592Ki
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             31957376Ki
  pods:               110
System Info:
  Machine ID:                 fb84d53edc164b438084094e4c1dd23c
  System UUID:                52337ed6-99e0-334c-8e1b-3a405b4bf026
  Boot ID:                    3094f23e-6484-445e-bed9-bfb8c02a4393
  Kernel Version:             5.4.0-90-generic
  OS Image:                   Ubuntu 20.04.6 LTS
  Operating System:           linux
  Architecture:               amd64
  Container Runtime Version:  containerd://1.6.28
  Kubelet Version:            v1.27.16
  Kube-Proxy Version:         v1.27.16
Non-terminated Pods:          (7 in total)
  Namespace                   Name                            CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                            ------------  ----------  ---------------  -------------  ---
  kube-system                 calico-node-nsx6s               250m (3%)     0 (0%)      0 (0%)           0 (0%)         3h8m
  metallb-system              speaker-5k4hr                   0 (0%)        0 (0%)      0 (0%)           0 (0%)         86d
  monitoring                  node-exporter-2cfzr             0 (0%)        0 (0%)      0 (0%)           0 (0%)         185d
  monitoring                  promtail-sqvqm                  20m (0%)      0 (0%)      64Mi (0%)        128Mi (0%)     291d
  openebs                     openebs-cstor-csi-node-f56pn    0 (0%)        0 (0%)      0 (0%)           0 (0%)         2y283d
  openebs                     openebs-jiva-csi-node-jb9zf     0 (0%)        0 (0%)      0 (0%)           0 (0%)         185d
  openebs                     openebs-ndm-jwbxf               0 (0%)        0 (0%)      0 (0%)           0 (0%)         2y283d
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests   Limits
  --------           --------   ------
  cpu                270m (3%)  0 (0%)
  memory             64Mi (0%)  128Mi (0%)
  ephemeral-storage  0 (0%)     0 (0%)
  hugepages-1Gi      0 (0%)     0 (0%)
  hugepages-2Mi      0 (0%)     0 (0%)
Events:
  Type     Reason                   Age                  From             Message
  ----     ------                   ----                 ----             -------
  Normal   Starting                 3h9m                 kube-proxy       
  Normal   NodeNotSchedulable       3h12m                kubelet          Node machiavelli status is now: NodeNotSchedulable
  Warning  InvalidDiskCapacity      3h9m                 kubelet          invalid capacity 0 on image filesystem
  Normal   Starting                 3h9m                 kubelet          Starting kubelet.
  Normal   NodeHasNoDiskPressure    3h9m                 kubelet          Node machiavelli status is now: NodeHasNoDiskPressure
  Normal   NodeHasSufficientMemory  3h9m                 kubelet          Node machiavelli status is now: NodeHasSufficientMemory
  Normal   NodeHasSufficientPID     3h9m                 kubelet          Node machiavelli status is now: NodeHasSufficientPID
  Normal   NodeNotReady             3h9m                 kubelet          Node machiavelli status is now: NodeNotReady
  Normal   NodeAllocatableEnforced  3h9m                 kubelet          Updated Node Allocatable limit across pods
  Normal   NodeReady                3h8m                 kubelet          Node machiavelli status is now: NodeReady
  Normal   NodeSchedulable          3h8m                 kubelet          Node machiavelli status is now: NodeSchedulable
  Normal   RegisteredNode           3h8m                 node-controller  Node machiavelli event: Registered Node machiavelli in Controller
  Normal   NodeNotSchedulable       179m (x2 over 3h9m)  kubelet          Node machiavelli status is now: NodeNotSchedulable

What Should Happen Instead?

I should be able to upgrade microk8s instance without incurring in strange problems.

Reproduction Steps

I have this problem in a VM, while I have successfully updated microk8s in another 2 VMs. They are similarly configured, there shouldn't be much difference.

Introspection Report

# microk8s inspect
Inspecting system
Inspecting Certificates
Inspecting services
  Service snap.microk8s.daemon-cluster-agent is running
  Service snap.microk8s.daemon-containerd is running
  Service snap.microk8s.daemon-kubelite is running
  Service snap.microk8s.daemon-k8s-dqlite is running
  Service snap.microk8s.daemon-apiserver-kicker is running
  Copy service arguments to the final report tarball
Inspecting AppArmor configuration
Gathering system information
  Copy processes list to the final report tarball
  Copy disk usage information to the final report tarball
  Copy memory usage information to the final report tarball
  Copy server uptime to the final report tarball
  Copy openSSL information to the final report tarball
  Copy snap list to the final report tarball
  Copy VM name (or none) to the final report tarball
  Copy current linux distribution to the final report tarball
  Copy asnycio usage and limits to the final report tarball
  Copy inotify max_user_instances and max_user_watches to the final report tarball
  Copy network configuration to the final report tarball
Inspecting kubernetes cluster
  Inspect kubernetes cluster
Inspecting dqlite
  Inspect dqlite

Building the report tarball
  Report tarball is at /var/snap/microk8s/7018/inspection-report-20240824_163332.tar.gz

inspection-report-20240824_163332.tar.gz

Can you suggest a fix?

I didn't understand the problem so I do not have a fix to suggest.

Are you interested in contributing with a fix?

yes

Answer 1 · 2024-09-02T05:55:00.000Z

Hi @davgia the error I am seeing in the logs (journalctl -fu snap.microk8s.daemon-kubelite) is that

Aug 24 16:33:14 machiavelli microk8s.daemon-kubelite[262289]: Error: failed to set feature gates from initial flags-based config: unrecognized feature gate: DevicePlugins
Aug 24 16:33:14 machiavelli microk8s.daemon-kubelite[262289]: F0824 16:33:14.674155  262289 daemon.go:57] Kubelet exited failed to set feature gates from initial flags-based config: unrecognized feature gate: DevicePlugins

This is casing the services to enter a crashloop.

Answer 2 · 2024-09-02T11:18:43.000Z

Hi @ktsakalozos, thank you so much. I completely missed it. I'll update the kubelet config and let you know once I have successfully updated microk8s. Thanks!