Explore how to enable Istio sidecar on charm pods
ca-scribner opened this issue · 11 comments
Context
To support improved network isolation of Charmed Kubeflow components, investigate how we can attach an istio sidecar to charm pods.
What needs to get done
Definition of Done
- a prototype implementation (spec or working example) that demonstrates how to deploy charms with istio sidecars
Thank you for reporting us your feedback!
The internal ticket has been created: https://warthogs.atlassian.net/browse/KF-5166.
This message was autogenerated
@kimwnasptd suggested doing something similar to observability does here with resource limits. iiuc, in that case each charm that wanted an istio sidecar would modify its own Juju-created StatefulSet to add the istio sidecar annotation, which would trigger a redeploy of the underlying charm(/workload) pod.
Some outstanding questions on this are:
- can you change a
StatefulSet
's pod annotations? I think yes, base on this - changing annotations triggers a rolling restart update that will kill the currently running charm pod (eg: the pod that asked for the annotation change). Is that a problem? It means the charm will be killed at some future time (maybe mid-execution, maybe after full execution). Has o11y had issues with this?
- will Juju overwrite this annotation later? What event hooks do we need to subscribe to to protect us from intervention by Juju?
- if a charm pod modifies itself to add istio, there is a period where the charm pod does not yet have an istio sidecar. Will this cause issues?
edit: by annotations, I think we mean labels
What about other possible solutions? Maybe:
- (added just so its easy to refer to later) using the above suggestion of a charm modifying its own statefulset to add the injection label
- can we tell Juju to set annotations at deploy time, maybe via constraints or config?
- can we instead use the namespace-level istio injection? Setting the label
istio-injection=enabled
on a namespace (eg:namespace/kubeflow
) means all new pods in that namespace will have an istio sidecar injected- this would mean ALL pods in the namespace get sidecars. Would this mess with the model controller? The model controller would exist before the label was set, but if it was killed and needed to restart it would now have a sidecar. Maybe we could block this somehow (could add a
do-not-inject
label on the model operator, if that was configurable - or at least an "allow all" policy on the model operator so it isn't blocked from doing its job)?
- this would mean ALL pods in the namespace get sidecars. Would this mess with the model controller? The model controller would exist before the label was set, but if it was killed and needed to restart it would now have a sidecar. Maybe we could block this somehow (could add a
- Use sidecar manual injection:
istioctl
has a command that will modify a pod to include the sidecar, similar to how the webhook would. Can we just do this as pebble containers? The actual istio sidecar is probably doable, but istio also injects an init-container afaik which probably cannot be done with juju+pebble?
Tried out option (1) (the "charm adds the istio label to its own statefulset" solution) using this charm:
charm.py
#!/usr/bin/env python3
import logging
import time
from typing import List, Optional
import yaml
from lightkube import Client
from lightkube.resources.apps_v1 import StatefulSet
from ops.charm import CharmBase
from ops.main import main
from ops.model import ActiveStatus
logger = logging.getLogger(__name__)
class IstioSidecarInjectorLabelTest(CharmBase):
"""Charm for testing whether a charm can self-inject an istio sidecar by changing its labels."""
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
logger.info("IstioSidecarInjectorLabelTest.__init__")
self.framework.observe(self.on.config_changed, self._echo_before_event)
self.framework.observe(self.on.config_changed, self._add_istio_injection_label_to_statefulset)
self.framework.observe(self.on.config_changed, self._echo_after_event)
def _echo_before_event(self, event):
"""Echoes before the event is handled."""
logger.info("IstioSidecarInjectorLabelTest._echo_before_event")
logger.info(f"Event: {event}")
def _echo_after_event(self, event):
"""Echoes when an event is handled."""
logger.info("IstioSidecarInjectorLabelTest._echo_after_event")
logger.info(f"Event: {event}")
def _add_istio_injection_label_to_statefulset(self, event):
"""Adds the istio-injection=true label to the statefulset for this charm."""
logger.info("IstioSidecarInjectorLabelTest._add_istio_injection_label_to_statefulset")
sleeptime = 10
logger.info(f"Sleeping {sleeptime} seconds")
time.sleep(sleeptime)
logger.info("Sleep complete")
lightkube_client = Client()
patch = {
"spec": {
"template": {
"metadata": {"labels": {"sidecar.istio.io/inject": "true"}}
}
}
}
logger.info("patching statefulset")
lightkube_client.patch(
StatefulSet, name=self.model.app.name, namespace=self.model.name, obj=patch
)
logger.info("patched statefulset")
sleeptime = 10
logger.info(f"Sleeping {sleeptime} seconds")
time.sleep(sleeptime)
logger.info("Sleep complete")
if __name__ == "__main__":
main(IstioSidecarInjectorLabelTest)
On config-changed
, the charm has 3 handlers:
_echo_before_event
: echos to say its handling config-changed_add_istio_injection_label_to_statefulset
: patches its own statefulset's podspec labels and waits a little after this is done_echo_after_event
: echos to say its handling config-changed
Deploying it (juju deploy ./*.charm
) to a cluster that has istio installed sees the following:
- charm deploys without a sidecar
_echo_before_event
fires and completes_add_istio_injection_label_to_statefulset
fires, patches this charm's own statefulset to add the istio label, and waits- after an unpredictable amount of time the pod running the charm code (in the statefulset we just patched) is terminated by kubernetes to roll out a new version of this pod
- the new version of the pod is rolled out with the istio label and istio injects a sidecar pod. this pod appears to start up correctly.
So this does appear to work
Is there a race condition with the above solution? Since doing juju deploy bundle.yaml
results in charms deploying all at once in no guaranteed order, doing something like juju deploy kubeflow
likely results in some charms deploying before the istio mutatingwebhooks are live. So will the above injection test charm add a label to its statefulset, restart the pod so the pod has the istio label, and still not receive an istio sidecar because nothing is watching for the label yet?
A hacky solution could be that the istio-label code above also checks to see if the new pod has an istio sidecar, but that feels ugly. And even if we spot the sidecar is missing, what do we do? At best we can defer and hope some other event triggers us again soon
Yes, after testing this the race condition does exist - if the label-injection occurs before istio has created its webhooks, the charm pods will be restarted but with the istio label but will not receive the istio sidecar
Testing out option (4) (manual sidecar injection):
Taking this StatefulSet
modified a little from the istio docs:
sleep.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: sleep
---
apiVersion: v1
kind: Service
metadata:
name: sleep
labels:
app: sleep
spec:
ports:
- port: 80
name: http
selector:
app: sleep
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: sleep
spec:
replicas: 1
selector:
matchLabels:
app: sleep
serviceName: sleep
template:
metadata:
labels:
app: sleep
spec:
terminationGracePeriodSeconds: 0
serviceAccountName: sleep
containers:
- command:
- /bin/sleep
- infinity
image: curlimages/curl
imagePullPolicy: IfNotPresent
name: sleep
volumeMounts:
- mountPath: /etc/sleep/tls
name: secret-volume
volumes:
- name: secret-volume
secret:
secretName: sleep-secret
optional: true
and running that through istioctl kube-inject -f sleep.yaml
, we get:
sleep_with_istio_injected.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: sleep
---
apiVersion: v1
kind: Service
metadata:
name: sleep
labels:
app: sleep
spec:
ports:
- port: 80
name: http
selector:
app: sleep
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
creationTimestamp: null
name: sleep
spec:
replicas: 1
selector:
matchLabels:
app: sleep
serviceName: sleep
template:
metadata:
annotations:
kubectl.kubernetes.io/default-container: sleep
kubectl.kubernetes.io/default-logs-container: sleep
prometheus.io/path: /stats/prometheus
prometheus.io/port: "15020"
prometheus.io/scrape: "true"
sidecar.istio.io/status: '{"initContainers":["istio-init"],"containers":["istio-proxy"],"volumes":["workload-socket","credential-socket","workload-certs","istio-envoy","istio-data","istio-podinfo","istio-token","istiod-ca-cert"],"imagePullSecrets":null,"revision":"default"}'
creationTimestamp: null
labels:
app: sleep
security.istio.io/tlsMode: istio
service.istio.io/canonical-name: sleep
service.istio.io/canonical-revision: latest
spec:
containers:
- command:
- /bin/sleep
- infinity
image: curlimages/curl
imagePullPolicy: IfNotPresent
name: sleep
resources: {}
volumeMounts:
- mountPath: /etc/sleep/tls
name: secret-volume
- args:
- proxy
- sidecar
- --domain
- $(POD_NAMESPACE).svc.cluster.local
- --proxyLogLevel=warning
- --proxyComponentLogLevel=misc:error
- --log_output_level=default:info
- --concurrency
- "2"
env:
- name: JWT_POLICY
value: third-party-jwt
- name: PILOT_CERT_PROVIDER
value: istiod
- name: CA_ADDR
value: istiod.kubeflow.svc:15012
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: INSTANCE_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
- name: SERVICE_ACCOUNT
valueFrom:
fieldRef:
fieldPath: spec.serviceAccountName
- name: HOST_IP
valueFrom:
fieldRef:
fieldPath: status.hostIP
- name: PROXY_CONFIG
value: |
{"discoveryAddress":"istiod.kubeflow.svc:15012","tracing":{"zipkin":{"address":"zipkin.kubeflow:9411"}}}
- name: ISTIO_META_POD_PORTS
value: |-
[
]
- name: ISTIO_META_APP_CONTAINERS
value: sleep
- name: ISTIO_META_CLUSTER_ID
value: Kubernetes
- name: ISTIO_META_NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: ISTIO_META_INTERCEPTION_MODE
value: REDIRECT
- name: ISTIO_META_MESH_ID
value: cluster.local
- name: TRUST_DOMAIN
value: cluster.local
image: docker.io/istio/proxyv2:1.17.3
name: istio-proxy
ports:
- containerPort: 15090
name: http-envoy-prom
protocol: TCP
readinessProbe:
failureThreshold: 30
httpGet:
path: /healthz/ready
port: 15021
initialDelaySeconds: 1
periodSeconds: 2
timeoutSeconds: 3
resources:
limits:
cpu: "2"
memory: 1Gi
requests:
cpu: 100m
memory: 128Mi
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
privileged: false
readOnlyRootFilesystem: true
runAsGroup: 1337
runAsNonRoot: true
runAsUser: 1337
volumeMounts:
- mountPath: /var/run/secrets/workload-spiffe-uds
name: workload-socket
- mountPath: /var/run/secrets/credential-uds
name: credential-socket
- mountPath: /var/run/secrets/workload-spiffe-credentials
name: workload-certs
- mountPath: /var/run/secrets/istio
name: istiod-ca-cert
- mountPath: /var/lib/istio/data
name: istio-data
- mountPath: /etc/istio/proxy
name: istio-envoy
- mountPath: /var/run/secrets/tokens
name: istio-token
- mountPath: /etc/istio/pod
name: istio-podinfo
initContainers:
- args:
- istio-iptables
- -p
- "15001"
- -z
- "15006"
- -u
- "1337"
- -m
- REDIRECT
- -i
- '*'
- -x
- ""
- -b
- '*'
- -d
- 15090,15021,15020
- --log_output_level=default:info
image: docker.io/istio/proxyv2:1.17.3
name: istio-init
resources:
limits:
cpu: "2"
memory: 1Gi
requests:
cpu: 100m
memory: 128Mi
securityContext:
allowPrivilegeEscalation: false
capabilities:
add:
- NET_ADMIN
- NET_RAW
drop:
- ALL
privileged: false
readOnlyRootFilesystem: false
runAsGroup: 0
runAsNonRoot: false
runAsUser: 0
serviceAccountName: sleep
terminationGracePeriodSeconds: 0
volumes:
- name: workload-socket
- name: credential-socket
- name: workload-certs
- emptyDir:
medium: Memory
name: istio-envoy
- emptyDir: {}
name: istio-data
- downwardAPI:
items:
- fieldRef:
fieldPath: metadata.labels
path: labels
- fieldRef:
fieldPath: metadata.annotations
path: annotations
name: istio-podinfo
- name: istio-token
projected:
sources:
- serviceAccountToken:
audience: istio-ca
expirationSeconds: 43200
path: istio-token
- configMap:
name: istio-ca-root-cert
name: istiod-ca-cert
- name: secret-volume
secret:
optional: true
secretName: sleep-secret
updateStrategy: {}
status:
replicas: 0
where the differences are essentially:
- additional annotations and labels, some of which I think matter to istio (but not confirmed)
- an additional container (the istio sidecar) with:
- a bunch of environment variables that depend on the configuration of istio in your current cluster (they add things like
"discoveryAddress":"istiod.kubeflow.svc:15012"
, which is dependent on where istio is deployed) - several volumes, some of which mount configmaps or secrets defined by istiod
- a bunch of environment variables that depend on the configuration of istio in your current cluster (they add things like
- (if not using the Istio CNI plugin) an initContainer to set up the iptable rules so networking goes through the istio sidecar
So to manually inject the sidecar as proposed, we'd need to replicate all of the above differences inside the charm. Addressing each separately:
- adding the annotations/labels: doable (through the same way as solution (1) injected the istio-injection labels).
- adding the sidecar container: easy, although configuring it might be more complicated (how do we get the inputs? do they change often? can we get them from the istio charm?), and the volumes may be an issue as I don't think there's a good way to do this with sidecars
- adding the initContainer: not implemented in charms now, and feels impossible (how can a Pod deploy its own initContainer?). If we're using the Istio CNI though, this is not an issue
Overall, this might be doable, but it does sound complicated. And this has a similar race condition as solution (1). If we haven't deployed istio yet, how do we get the configurations for the istio sidecar? Maybe they're deterministic and we can specify them statically, but then do we bake all that configuration into the code? If we do know the configurations, we'd still have to test what happens when a sidecar starts but the istiod it depends on is not available - does it gracefully handle this until istio is alive?
Looking at option (3) (use namespace-level injections by labelling the namespace with istio-injection=enabled
)...
Implementation of this option is much easier for most charms. By annotating the kubeflow namespace for istio injection, we no longer need to change any of the charms themselves.
We do, however, face a similar race condition to option (1). If we do:
juju add-model kubeflow
kubectl label namespace kubeflow istio-injection=enabled --overwrite
juju deploy kubeflow
the kubeflow namespace will have the correct istio labels, but until the istio-pilot charm has deployed istio's webhooks the pods created will not get istio sidecars. So if you have the charms deploy in this order, you might see:
- charmA <-- does not get a sidecar
- charmB <-- does not get a sidecar
- istio-pilot <-- implements the sidecar mutating webhook
- charmC <-- gets a sidecar
So we still have the issue where we need istio deployed before everything else.
Another issue with this solution is that we could accidentally give juju's model operator an istio sidecar. iiuc this by itself shouldn't hurt the model operator alone, but if we also had a deny-all
default policy in the Kubeflow namespace then we might accidentally block the traffic. We'd need to address this too
Given the race conditions affecting (1) and (3), I wonder if it would make sense to unbundle istio from the rest of Kubeflow. This is a large change, but would have benefits.
Maybe we have:
- istiod deployed somewhere in the cluster
- could be managed by an istio charm in a separate model, or a user's own istio install that is not managed by charms
- istio-integrator charm:
- configured to use the existing istio (config options to set where istio is, etc)
- provides things like our
ingress
relations that we currently get from istio pilot
Then the kubeflow bundle would include istio-integrator
rather than istio
itself, and deployment would be like:
juju add-model istio-system
juju deploy istio # (the daemon)
# wait
juju add-model kubeflow
juju deploy kubeflow # (includes istio-integrator)
Notes from an internal meeting here
Summary is that we have some design decisions left to make - will report back here when we decide them
This is also affected by whether we are using the istio CNI plugin or not. For example, looking at the output for istioctl kube-inject ...
, it includes the istio-init
container that sets up the networking. So manual injection must be different depending on whether you're using the istio CNI plugin