k8snetworkplumbingwg/multus-cni

multus upgrade from 3.x to 4.x (thin plugin) causes pods start up issues

rgaduput opened this issue · 3 comments

What happend:
When we have upgraded the multus from v3.9 to latest v4.0.2 we see that all the pods failed to start in the "Initialize" phase. We understood that this only happens when plugin is upgraded. If done a fresh installation of multus v4.0.2 everything is fine. Plugin was upgraded to thin version, by applying the multus manifest file. (Also if upgraded from v3.9 to v3.9.3 no issues found)

What you expected to happen:
After multus upgrade pods to start normally without any issues.

How to reproduce it (as minimally and precisely as possible):
Install K8S, Calico, Istio with CNI enabled, multus v3.9 and test below pod creation.
Upgrade multus from 3.9 -> 4.0.2 (thin plugin) and try create pods again.

kubectl create namespace test
kubectl label namespace test istio-injection=enabled
kubectl -n test create deployment nginx --image=nginx
#pod will fail to start on init

Anything else we need to know?:
Please note we have istio mesh being used in the environment of version 1.17.x. (Using Istio CNI feature instead of init side car container)

Environment:

  • Multus version v4.0.2
    image path and image ID (from 'docker images') ghcr.io/k8snetworkplumbingwg/multus-cni:v4.0.2
  • Kubernetes version (use kubectl version): 1.27.6
  • Primary CNI for Kubernetes cluster: Calico v3.26.4
  • OS (e.g. from /etc/os-release): CentOS 7
  • File of '/etc/cni/net.d/'
  • File of '/etc/cni/multus/net.d'
  • NetworkAttachment info (use kubectl get net-attach-def -o yaml)
apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
  annotations:
    meta.helm.sh/release-name: test
    meta.helm.sh/release-namespace: test
  labels:
    app.kubernetes.io/managed-by: Helm
  name: istio-cni
  namespace: test
  • Target pod yaml info (with annotation, use kubectl get pod <podname> -o yaml)
  • Other log outputs (if you use multus logging)

Exceptions from Pod description:
Attached.
pod-description.txt

Looks like some pretty lengthy istio errors in the pod-description.txt

Command error output: xtables other problem: line 2 failed\"}\n{\"level\":\"error\",\"time\":\"20
24-01-30T10:49:42.993601Z\",\"msg\":\"Failed to execute: iptables-restore --noflush /tmp/iptables-rules-1706611782988461011.txt2479629019, exit status 1\"}\n"

@dougbtv true, but at i have checked the iptables file referenced in the exception and there are no issues with it.
More over what we are trying to understand is even though Istio version, config, k8s cluster version and config etc remain same only issues are faced when multus upgraded from 3.9.x to 4.0.x. In all the other scenarios it works fine. So trying to understand if we miss config changes or anything else in this multus major upgrade.

multus upgrade 3.9 -> 3.9.3 : Works
fresh installation of multus 4.0 : Works
multus upgrade 3.9 -> 4.0.x: Fails

because of this i am not so sure if its actually a issue from Istio.

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 7 days.