FUSE CSI driver using native sidecar mutates restart policy on wrong init container
Baune8D opened this issue · 8 comments
We ran into this problem after upgrading our AutoPilot cluster from K8s 1.28.7 to 1.29.4.
We also run managed Anthos Service Mesh v1.18.7
When deploying pods using both FUSE CSI driver and Istio proxy, restartPolicy
gets mutated to always
on istio-validation
init container instead of gke-gcsfuse-sidecar
.
Manifest: FUSE CSI driver: enabled
- Istio sidecar injection: enabled
Notice restartPolicy: always
gets applied to istio-validation
init container, and not to gke-gcsfuse-sidecar
initContainers:
- args:
- istio-iptables
- '-p'
- '15001'
- '-z'
- '15006'
- '-u'
- '1337'
- '-m'
- REDIRECT
- '-i'
- '*'
- '-x'
- ''
- '-b'
- '*'
- '-d'
- '15090,15021,15020'
- '--log_output_level=default:info'
- '--run-validation'
- '--skip-rule-apply'
env:
- name: CA_PROVIDER
value: GoogleCA
- name: CA_ROOT_CA
value: /etc/ssl/certs/ca-certificates.crt
- name: CA_TRUSTANCHOR
- name: EXIT_ON_ZERO_ACTIVE_CONNECTIONS
value: 'true'
- name: FLEET_PROJECT_NUMBER
value: 'xxx'
- name: GCP_METADATA
value: xxx|xxx|xxx|xxx
- name: OUTPUT_CERTS
value: /etc/istio/proxy
- name: PROXY_CONFIG_XDS_AGENT
value: 'true'
- name: XDS_AUTH_PROVIDER
value: gcp
- name: XDS_ROOT_CA
value: /etc/ssl/certs/ca-certificates.crt
image: 'gcr.io/gke-release/asm/proxyv2:1.18.7-asm.21'
imagePullPolicy: IfNotPresent
name: istio-validation
resources:
limits:
cpu: 500m
ephemeral-storage: 1Gi
memory: 512Mi
requests:
cpu: 500m
ephemeral-storage: 1Gi
memory: 512Mi
restartPolicy: Always
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
privileged: false
readOnlyRootFilesystem: true
runAsGroup: 1337
runAsNonRoot: true
runAsUser: 1337
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-api-access-n6xh7
readOnly: true
- args:
- '--v=5'
env:
- name: NATIVE_SIDECAR
value: 'TRUE'
image: >-
gke.gcr.io/gcs-fuse-csi-driver-sidecar-mounter:v1.2.0-gke.0@sha256:31880114306b1fb5d9e365ae7d4771815ea04eb56f0464a514a810df9470f88f
imagePullPolicy: IfNotPresent
name: gke-gcsfuse-sidecar
resources:
limits:
cpu: 250m
ephemeral-storage: 5Gi
memory: 256Mi
requests:
cpu: 250m
ephemeral-storage: 5Gi
memory: 256Mi
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
readOnlyRootFilesystem: true
runAsGroup: 65534
runAsNonRoot: true
runAsUser: 65534
seccompProfile:
type: RuntimeDefault
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /gcsfuse-tmp
name: gke-gcsfuse-tmp
- mountPath: /gcsfuse-buffer
name: gke-gcsfuse-buffer
- mountPath: /gcsfuse-cache
name: gke-gcsfuse-cache
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-api-access-n6xh7
readOnly: true
Manifest: FUSE CSI driver: disabled
- Istio sidecar injection: enabled
This time there is no restartPolicy: always
on the istio-validation
init container.
initContainers:
- args:
- istio-iptables
- '-p'
- '15001'
- '-z'
- '15006'
- '-u'
- '1337'
- '-m'
- REDIRECT
- '-i'
- '*'
- '-x'
- ''
- '-b'
- '*'
- '-d'
- '15090,15021,15020'
- '--log_output_level=default:info'
- '--run-validation'
- '--skip-rule-apply'
env:
- name: CA_PROVIDER
value: GoogleCA
- name: CA_ROOT_CA
value: /etc/ssl/certs/ca-certificates.crt
- name: CA_TRUSTANCHOR
- name: EXIT_ON_ZERO_ACTIVE_CONNECTIONS
value: 'true'
- name: FLEET_PROJECT_NUMBER
value: 'xxx'
- name: GCP_METADATA
value: xxx|xxx|xxx|xxx
- name: OUTPUT_CERTS
value: /etc/istio/proxy
- name: PROXY_CONFIG_XDS_AGENT
value: 'true'
- name: XDS_AUTH_PROVIDER
value: gcp
- name: XDS_ROOT_CA
value: /etc/ssl/certs/ca-certificates.crt
image: 'gcr.io/gke-release/asm/proxyv2:1.18.7-asm.21'
imagePullPolicy: IfNotPresent
name: istio-validation
resources:
limits:
cpu: 500m
ephemeral-storage: 1152Mi
memory: 512Mi
requests:
cpu: 500m
ephemeral-storage: 1152Mi
memory: 512Mi
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
privileged: false
readOnlyRootFilesystem: true
runAsGroup: 1337
runAsNonRoot: true
runAsUser: 1337
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-api-access-fpcmd
readOnly: true
Manifest: FUSE CSI driver: ´enabled- Istio sidecar injection:
disabled`
Now the FUSE CSI driver sidecar contains restartPolicy: always
as expected
initContainers:
- args:
- '--v=5'
env:
- name: NATIVE_SIDECAR
value: 'TRUE'
image: >-
gke.gcr.io/gcs-fuse-csi-driver-sidecar-mounter:v1.2.0-gke.0@sha256:31880114306b1fb5d9e365ae7d4771815ea04eb56f0464a514a810df9470f88f
imagePullPolicy: IfNotPresent
name: gke-gcsfuse-sidecar
resources:
limits:
cpu: 250m
ephemeral-storage: 5Gi
memory: 256Mi
requests:
cpu: 250m
ephemeral-storage: 5Gi
memory: 256Mi
restartPolicy: Always
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
readOnlyRootFilesystem: true
runAsGroup: 65534
runAsNonRoot: true
runAsUser: 65534
seccompProfile:
type: RuntimeDefault
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /gcsfuse-tmp
name: gke-gcsfuse-tmp
- mountPath: /gcsfuse-buffer
name: gke-gcsfuse-buffer
- mountPath: /gcsfuse-cache
name: gke-gcsfuse-cache
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-api-access-rcxtp
readOnly: true
This manifests as the same symptoms as seen here: #53 (comment)
Could you provide the location of the istio-proxy
sidecar? I'm curious if it's being injected as a regular container (which would cause incompatibility). I'm not sure if ASM v1.18.7 injects istio-proxy
as a native sidecar by default, but there should be a configuration that allows for this type of injection from ASM side.
If this is not allowed in ASM v1.18.7, it's worth to upgrade to a ASM version that supports istio-proxy
as a native sidecar.
Could you provide the location of the
istio-proxy
sidecar? I'm curious if it's being injected as a regular container (which would cause incompatibility). I'm not sure if ASM v1.18.7 injectsistio-proxy
as a native sidecar by default, but there should be a configuration that allows for this type of injection from ASM side.If this is not allowed in ASM v1.18.7, it's worth to upgrade to a ASM version that supports
istio-proxy
as a native sidecar.
istio-validation
is injected as the first init container, and istio-proxy
is injected as a regular container. I think Istio only supports native sidecars from 1.19. Also we run managed Anthos Service Mesh through stable channel which only supports 1.18 at the moment, and native sidecar seems to be a Pilot setting that we have no control over, i am not even sure if ASM supports it in newer versions at it seems to be an opt-in setting in Istio.
To me this clearly seems like a bug in FUSE driver since it mutates the wrong container. Basicly it makes istio-validation
run as native sidecar, and the FUSE sidecar as a regular init container.
Generally speaking, we want to make sure istio is using 1.19 to be compatible with our driver. Even if there wasn't any modification to the init containers, I believe the gcsfuse sidecar would fail to start since istio-proxy
must be running before any other containers that use the network.
With that said, this is very interesting behavior. What would be good to know is if this is a problem with GCSFuse webhook or istio webhook. The reason I believe we need to check this is because:
- GCSFuse webhook injects the container after the
istio-proxy
sidecar when present in the same container list (eg. init container list). When it is not, we inject at first position. - The GCSFuse native sidecar is shown to be injected in second position in the spec provided. This likely means the last webhook to make any changes was the istio webhook.
- NATIVE_SIDECAR env_var is injected at the same time as the restartPolicy every time from the GCSFuse webhook.
I think it would be good to manually declare the gcsfuse native sidecar in the yaml spec with the driver disabled and istio enabled, and see if istio is able to correctly inject the init container without modifying any other sidecars.
Generally speaking, we want to make sure istio is using 1.19 to be compatible with our driver.
Maybe you should consider a way to disable the use of native sidecar? I feel it would make sense for you to support whatever is considered stable Google offerings. We have no way of upgrading Istio past 1.18 at the moment since we use the stable release channel of managed ASM.
Even if there wasn't any modification to the init containers, I believe the gcsfuse sidecar would fail to start since istio-proxy must be running before any other containers that use the network.
You definitly might be right about this. I will try to declare the FUSE sidecar manually tomorrow and see how things work out.
GCSFuse webhook injects the container after the istio-proxy sidecar when present in the same container list (eg. init container list). When it is not, we inject at first position.
The istio-proxy
container is not present in the same container list as the FUSE sidecar, since it resides in the normal container list. The istio-validation
init container is though and this is the one who gets restartPolicy
applied.
I will report back tomorrow when i get a chance to try out the FUSE sidecar in combination with Istio when declared manually.
The istio-proxy container is not present in the same container list as the FUSE sidecar, since it resides in the normal container list. The istio-validation init container is though and this is the one who gets restartPolicy applied.
There are multiple istio sidecars, and many istio sidecar combinations that exists. For simplification, we only look at istio-proxy because it is guaranteed by istio that this is the last istio sidecar injected (ordering wise).
I will report back tomorrow when i get a chance to try out the FUSE sidecar in combination with Istio when declared manually.
Awesome!
@hime I did some more investigating, and this is the results:
- If i disable FUSE sidecar injection and define the FUSE sidecar manually, the issue still happens.
- If i disable FUSE sidecar injection and define the FUSE sidecar manually, and i also define the
istio-validation
init container manually. The issue does NOT happen. Actually everything starts up fine, both FUSE and Istio seems to be working correctly even though Istio still runs as a regular sidecar.
I cannot test the behaviour with Istio sidecar injection disabled and FUSE sidecar injection enabled, because if i define istio-validation
manually, FUSE injects its sidecar before istio-validation
, and the problems always manifests as the restartPolicy
moving to the container before the one where it is defined.
So i it seems you were correct and the problem is related to the Istio webhooks, and not to FUSE CSI driver.
This investigation also presented a workaround for now, by defining istio-validation
manually, Istio still injects istio-proxy
as usual, but the restartPolicy
does not get messed up.
Closing this issue.
This issue is actually on ASM webhook -- the webhook wrongly modified the gke-gcsfuse-sidecar
init container while injecting the istio-validation
init container. The ASM team acknowledged this issue and we are working on a fix. Thank you for reporting this issue!