GoogleCloudPlatform/gcs-fuse-csi-driver

The sidecar container does not work well with istio-proxy sidecar container

songjiaxun opened this issue · 3 comments

Symptom

If the gcsfuse sidecar container starts before the istio-proxy sidecar container, the gcsfuse will fail with the following error:

mountWithArgs: failed to open connection - getConnWithRetry: get token source: DefaultTokenSource: google: could not find default credentials. See https://developers.google.com/accounts/docs/application-default-credentials for more information.

Root cause

The same as #46 (comment).

Solution

This issue is tracked on GitHub, and we are waiting for the Kubernetes sidecar container feature available on GKE to ultimately solve this issue.

Ran into a similar problem (on k8s 1.29 w/ the new sidecar container feature - running on GKE). The initContainers injected by istio fail to start/move to ready and the pod is indefinitely left in PodInitialisating.

The initContainers:

  initContainers:
  - args:
    - istio-iptables
    - -p
    - "15001"
    - -z
    - "15006"
    - -u
    - "1337"
    - -m
    - REDIRECT
    - -i
    - '*'
    - -x
    - ""
    - -b
    - '*'
    - -d
    - 15090,15021,15020
    - --log_output_level=default:info
    - --run-validation
    - --skip-rule-apply
    ...
    image: gcr.io/gke-release/asm/proxyv2:1.18.7-asm.4
    imagePullPolicy: IfNotPresent
    name: istio-validation
    resources:
      limits:
        cpu: "2"
        memory: 1Gi
      requests:
        cpu: 100m
        memory: 128Mi
    restartPolicy: Always
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    ...
  - args:
    - --v=5
    env:
    - name: NATIVE_SIDECAR
      value: "TRUE"
    image: gke.gcr.io/gcs-fuse-csi-driver-sidecar-mounter:v1.2.0-gke.0@sha256:31880114306b1fb5d9e365ae7d4771815ea04eb56f0464a514a810df9470f88f
    imagePullPolicy: IfNotPresent
    name: gke-gcsfuse-sidecar
    resources:
      requests:
        cpu: 250m
        ephemeral-storage: 5Gi
        memory: 256Mi
    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
        - ALL
      readOnlyRootFilesystem: true
      runAsGroup: 65534
      runAsNonRoot: true
      runAsUser: 65534
      seccompProfile:
        type: RuntimeDefault
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /gcsfuse-tmp
      name: gke-gcsfuse-tmp
    - mountPath: /gcsfuse-buffer
      name: gke-gcsfuse-buffer
    - mountPath: /gcsfuse-cache
      name: gke-gcsfuse-cache
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-x7xtn
      readOnly: true

Node log:

time="2024-04-24T14:16:18.466416045Z" level=error msg="RemoveContainer for \"b5b2c3c4b240a1dff592a84b12bcd393f399839456b075536850fa1f51b398e1\" failed" error="failed to set removing state for container \"b5b2c3c4b240a1dff592a84b12bcd393f399839456b075536850fa1f51b398e1\": container is already in removing state"
...
E0424 14:16:19.470058    2739 pod_workers.go:1298] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"istio-validation\" with CrashLoopBackOff: \"back-off 5m0s restarting failed container=istio-validation pod=pod-56cbdb5784-vzc44_mesh(70d414ff-61da-4eb0-b721-15ae65057379)\"" pod="mesh/pod-56cbdb5784-vzc44" podUID="70d414ff-61da-4eb0-b721-15ae65057379"

Deployed the same code but without the CSI driver sidecar and it all starts correctly.

Worth noting it used the same pod used to work with k8s 1.28 (no sidecar container feature).

also came across your issue you described here today, I used the following pod annotation to make it work:

proxy.istio.io/config: '{ "holdApplicationUntilProxyStarts": true }'

Source: https://istio.io/latest/docs/reference/config/istio.mesh.v1alpha1/