canonical/istio-operators

Explore how to enable Istio sidecar on charm pods

ca-scribner opened this issue · 11 comments

Context

To support improved network isolation of Charmed Kubeflow components, investigate how we can attach an istio sidecar to charm pods.

What needs to get done

Definition of Done

  1. a prototype implementation (spec or working example) that demonstrates how to deploy charms with istio sidecars

Thank you for reporting us your feedback!

The internal ticket has been created: https://warthogs.atlassian.net/browse/KF-5166.

This message was autogenerated

@kimwnasptd suggested doing something similar to observability does here with resource limits. iiuc, in that case each charm that wanted an istio sidecar would modify its own Juju-created StatefulSet to add the istio sidecar annotation, which would trigger a redeploy of the underlying charm(/workload) pod.

Some outstanding questions on this are:

  • can you change a StatefulSet's pod annotations? I think yes, base on this
  • changing annotations triggers a rolling restart update that will kill the currently running charm pod (eg: the pod that asked for the annotation change). Is that a problem? It means the charm will be killed at some future time (maybe mid-execution, maybe after full execution). Has o11y had issues with this?
  • will Juju overwrite this annotation later? What event hooks do we need to subscribe to to protect us from intervention by Juju?
  • if a charm pod modifies itself to add istio, there is a period where the charm pod does not yet have an istio sidecar. Will this cause issues?

edit: by annotations, I think we mean labels

What about other possible solutions? Maybe:

  1. (added just so its easy to refer to later) using the above suggestion of a charm modifying its own statefulset to add the injection label
  2. can we tell Juju to set annotations at deploy time, maybe via constraints or config?
  3. can we instead use the namespace-level istio injection? Setting the label istio-injection=enabled on a namespace (eg: namespace/kubeflow) means all new pods in that namespace will have an istio sidecar injected
    • this would mean ALL pods in the namespace get sidecars. Would this mess with the model controller? The model controller would exist before the label was set, but if it was killed and needed to restart it would now have a sidecar. Maybe we could block this somehow (could add a do-not-inject label on the model operator, if that was configurable - or at least an "allow all" policy on the model operator so it isn't blocked from doing its job)?
  4. Use sidecar manual injection: istioctl has a command that will modify a pod to include the sidecar, similar to how the webhook would. Can we just do this as pebble containers? The actual istio sidecar is probably doable, but istio also injects an init-container afaik which probably cannot be done with juju+pebble?

Tried out option (1) (the "charm adds the istio label to its own statefulset" solution) using this charm:

charm.py
#!/usr/bin/env python3
import logging
import time
from typing import List, Optional

import yaml
from lightkube import Client
from lightkube.resources.apps_v1 import StatefulSet
from ops.charm import CharmBase
from ops.main import main
from ops.model import ActiveStatus

logger = logging.getLogger(__name__)


class IstioSidecarInjectorLabelTest(CharmBase):
    """Charm for testing whether a charm can self-inject an istio sidecar by changing its labels."""

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        logger.info("IstioSidecarInjectorLabelTest.__init__")

        self.framework.observe(self.on.config_changed, self._echo_before_event)
        self.framework.observe(self.on.config_changed, self._add_istio_injection_label_to_statefulset)
        self.framework.observe(self.on.config_changed, self._echo_after_event)

    def _echo_before_event(self, event):
        """Echoes before the event is handled."""
        logger.info("IstioSidecarInjectorLabelTest._echo_before_event")
        logger.info(f"Event: {event}")

    def _echo_after_event(self, event):
        """Echoes when an event is handled."""
        logger.info("IstioSidecarInjectorLabelTest._echo_after_event")
        logger.info(f"Event: {event}")


    def _add_istio_injection_label_to_statefulset(self, event):
        """Adds the istio-injection=true label to the statefulset for this charm."""
        logger.info("IstioSidecarInjectorLabelTest._add_istio_injection_label_to_statefulset")

        sleeptime = 10
        logger.info(f"Sleeping {sleeptime} seconds")
        time.sleep(sleeptime)
        logger.info("Sleep complete")

        lightkube_client = Client()
        patch = {
            "spec": {
                "template": {
                    "metadata": {"labels": {"sidecar.istio.io/inject": "true"}}
                }
            }
        }

        logger.info("patching statefulset")
        lightkube_client.patch(
            StatefulSet, name=self.model.app.name, namespace=self.model.name, obj=patch
        )
        logger.info("patched statefulset")

        sleeptime = 10
        logger.info(f"Sleeping {sleeptime} seconds")
        time.sleep(sleeptime)
        logger.info("Sleep complete")

if __name__ == "__main__":
    main(IstioSidecarInjectorLabelTest)

On config-changed, the charm has 3 handlers:

  • _echo_before_event: echos to say its handling config-changed
  • _add_istio_injection_label_to_statefulset: patches its own statefulset's podspec labels and waits a little after this is done
  • _echo_after_event: echos to say its handling config-changed

Deploying it (juju deploy ./*.charm) to a cluster that has istio installed sees the following:

  1. charm deploys without a sidecar
  2. _echo_before_event fires and completes
  3. _add_istio_injection_label_to_statefulset fires, patches this charm's own statefulset to add the istio label, and waits
  4. after an unpredictable amount of time the pod running the charm code (in the statefulset we just patched) is terminated by kubernetes to roll out a new version of this pod
  5. the new version of the pod is rolled out with the istio label and istio injects a sidecar pod. this pod appears to start up correctly.

So this does appear to work

Is there a race condition with the above solution? Since doing juju deploy bundle.yaml results in charms deploying all at once in no guaranteed order, doing something like juju deploy kubeflow likely results in some charms deploying before the istio mutatingwebhooks are live. So will the above injection test charm add a label to its statefulset, restart the pod so the pod has the istio label, and still not receive an istio sidecar because nothing is watching for the label yet?

A hacky solution could be that the istio-label code above also checks to see if the new pod has an istio sidecar, but that feels ugly. And even if we spot the sidecar is missing, what do we do? At best we can defer and hope some other event triggers us again soon

Yes, after testing this the race condition does exist - if the label-injection occurs before istio has created its webhooks, the charm pods will be restarted but with the istio label but will not receive the istio sidecar

Testing out option (4) (manual sidecar injection):

Taking this StatefulSet modified a little from the istio docs:

sleep.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: sleep
---
apiVersion: v1
kind: Service
metadata:
  name: sleep
  labels:
    app: sleep
spec:
  ports:
  - port: 80
    name: http
  selector:
    app: sleep
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: sleep
spec:
  replicas: 1
  selector:
    matchLabels:
      app: sleep
  serviceName: sleep
  template:
    metadata:
      labels:
        app: sleep
    spec:
      terminationGracePeriodSeconds: 0
      serviceAccountName: sleep
      containers:
      - command:
        - /bin/sleep
        - infinity
        image: curlimages/curl
        imagePullPolicy: IfNotPresent
        name: sleep
        volumeMounts:
        - mountPath: /etc/sleep/tls
          name: secret-volume
      volumes:
      - name: secret-volume
        secret:
          secretName: sleep-secret
          optional: true

and running that through istioctl kube-inject -f sleep.yaml, we get:

sleep_with_istio_injected.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: sleep
---
apiVersion: v1
kind: Service
metadata:
  name: sleep
  labels:
    app: sleep
spec:
  ports:
  - port: 80
    name: http
  selector:
    app: sleep
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  creationTimestamp: null
  name: sleep
spec:
  replicas: 1
  selector:
    matchLabels:
      app: sleep
  serviceName: sleep
  template:
    metadata:
      annotations:
        kubectl.kubernetes.io/default-container: sleep
        kubectl.kubernetes.io/default-logs-container: sleep
        prometheus.io/path: /stats/prometheus
        prometheus.io/port: "15020"
        prometheus.io/scrape: "true"
        sidecar.istio.io/status: '{"initContainers":["istio-init"],"containers":["istio-proxy"],"volumes":["workload-socket","credential-socket","workload-certs","istio-envoy","istio-data","istio-podinfo","istio-token","istiod-ca-cert"],"imagePullSecrets":null,"revision":"default"}'
      creationTimestamp: null
      labels:
        app: sleep
        security.istio.io/tlsMode: istio
        service.istio.io/canonical-name: sleep
        service.istio.io/canonical-revision: latest
    spec:
      containers:
      - command:
        - /bin/sleep
        - infinity
        image: curlimages/curl
        imagePullPolicy: IfNotPresent
        name: sleep
        resources: {}
        volumeMounts:
        - mountPath: /etc/sleep/tls
          name: secret-volume
      - args:
        - proxy
        - sidecar
        - --domain
        - $(POD_NAMESPACE).svc.cluster.local
        - --proxyLogLevel=warning
        - --proxyComponentLogLevel=misc:error
        - --log_output_level=default:info
        - --concurrency
        - "2"
        env:
        - name: JWT_POLICY
          value: third-party-jwt
        - name: PILOT_CERT_PROVIDER
          value: istiod
        - name: CA_ADDR
          value: istiod.kubeflow.svc:15012
        - name: POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: POD_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        - name: INSTANCE_IP
          valueFrom:
            fieldRef:
              fieldPath: status.podIP
        - name: SERVICE_ACCOUNT
          valueFrom:
            fieldRef:
              fieldPath: spec.serviceAccountName
        - name: HOST_IP
          valueFrom:
            fieldRef:
              fieldPath: status.hostIP
        - name: PROXY_CONFIG
          value: |
            {"discoveryAddress":"istiod.kubeflow.svc:15012","tracing":{"zipkin":{"address":"zipkin.kubeflow:9411"}}}
        - name: ISTIO_META_POD_PORTS
          value: |-
            [
            ]
        - name: ISTIO_META_APP_CONTAINERS
          value: sleep
        - name: ISTIO_META_CLUSTER_ID
          value: Kubernetes
        - name: ISTIO_META_NODE_NAME
          valueFrom:
            fieldRef:
              fieldPath: spec.nodeName
        - name: ISTIO_META_INTERCEPTION_MODE
          value: REDIRECT
        - name: ISTIO_META_MESH_ID
          value: cluster.local
        - name: TRUST_DOMAIN
          value: cluster.local
        image: docker.io/istio/proxyv2:1.17.3
        name: istio-proxy
        ports:
        - containerPort: 15090
          name: http-envoy-prom
          protocol: TCP
        readinessProbe:
          failureThreshold: 30
          httpGet:
            path: /healthz/ready
            port: 15021
          initialDelaySeconds: 1
          periodSeconds: 2
          timeoutSeconds: 3
        resources:
          limits:
            cpu: "2"
            memory: 1Gi
          requests:
            cpu: 100m
            memory: 128Mi
        securityContext:
          allowPrivilegeEscalation: false
          capabilities:
            drop:
            - ALL
          privileged: false
          readOnlyRootFilesystem: true
          runAsGroup: 1337
          runAsNonRoot: true
          runAsUser: 1337
        volumeMounts:
        - mountPath: /var/run/secrets/workload-spiffe-uds
          name: workload-socket
        - mountPath: /var/run/secrets/credential-uds
          name: credential-socket
        - mountPath: /var/run/secrets/workload-spiffe-credentials
          name: workload-certs
        - mountPath: /var/run/secrets/istio
          name: istiod-ca-cert
        - mountPath: /var/lib/istio/data
          name: istio-data
        - mountPath: /etc/istio/proxy
          name: istio-envoy
        - mountPath: /var/run/secrets/tokens
          name: istio-token
        - mountPath: /etc/istio/pod
          name: istio-podinfo
      initContainers:
      - args:
        - istio-iptables
        - -p
        - "15001"
        - -z
        - "15006"
        - -u
        - "1337"
        - -m
        - REDIRECT
        - -i
        - '*'
        - -x
        - ""
        - -b
        - '*'
        - -d
        - 15090,15021,15020
        - --log_output_level=default:info
        image: docker.io/istio/proxyv2:1.17.3
        name: istio-init
        resources:
          limits:
            cpu: "2"
            memory: 1Gi
          requests:
            cpu: 100m
            memory: 128Mi
        securityContext:
          allowPrivilegeEscalation: false
          capabilities:
            add:
            - NET_ADMIN
            - NET_RAW
            drop:
            - ALL
          privileged: false
          readOnlyRootFilesystem: false
          runAsGroup: 0
          runAsNonRoot: false
          runAsUser: 0
      serviceAccountName: sleep
      terminationGracePeriodSeconds: 0
      volumes:
      - name: workload-socket
      - name: credential-socket
      - name: workload-certs
      - emptyDir:
          medium: Memory
        name: istio-envoy
      - emptyDir: {}
        name: istio-data
      - downwardAPI:
          items:
          - fieldRef:
              fieldPath: metadata.labels
            path: labels
          - fieldRef:
              fieldPath: metadata.annotations
            path: annotations
        name: istio-podinfo
      - name: istio-token
        projected:
          sources:
          - serviceAccountToken:
              audience: istio-ca
              expirationSeconds: 43200
              path: istio-token
      - configMap:
          name: istio-ca-root-cert
        name: istiod-ca-cert
      - name: secret-volume
        secret:
          optional: true
          secretName: sleep-secret
  updateStrategy: {}
status:
  replicas: 0

where the differences are essentially:

  • additional annotations and labels, some of which I think matter to istio (but not confirmed)
  • an additional container (the istio sidecar) with:
    • a bunch of environment variables that depend on the configuration of istio in your current cluster (they add things like "discoveryAddress":"istiod.kubeflow.svc:15012", which is dependent on where istio is deployed)
    • several volumes, some of which mount configmaps or secrets defined by istiod
  • (if not using the Istio CNI plugin) an initContainer to set up the iptable rules so networking goes through the istio sidecar

So to manually inject the sidecar as proposed, we'd need to replicate all of the above differences inside the charm. Addressing each separately:

  • adding the annotations/labels: doable (through the same way as solution (1) injected the istio-injection labels).
  • adding the sidecar container: easy, although configuring it might be more complicated (how do we get the inputs? do they change often? can we get them from the istio charm?), and the volumes may be an issue as I don't think there's a good way to do this with sidecars
  • adding the initContainer: not implemented in charms now, and feels impossible (how can a Pod deploy its own initContainer?). If we're using the Istio CNI though, this is not an issue

Overall, this might be doable, but it does sound complicated. And this has a similar race condition as solution (1). If we haven't deployed istio yet, how do we get the configurations for the istio sidecar? Maybe they're deterministic and we can specify them statically, but then do we bake all that configuration into the code? If we do know the configurations, we'd still have to test what happens when a sidecar starts but the istiod it depends on is not available - does it gracefully handle this until istio is alive?

Looking at option (3) (use namespace-level injections by labelling the namespace with istio-injection=enabled)...

Implementation of this option is much easier for most charms. By annotating the kubeflow namespace for istio injection, we no longer need to change any of the charms themselves.

We do, however, face a similar race condition to option (1). If we do:

juju add-model kubeflow
kubectl label namespace kubeflow istio-injection=enabled --overwrite
juju deploy kubeflow

the kubeflow namespace will have the correct istio labels, but until the istio-pilot charm has deployed istio's webhooks the pods created will not get istio sidecars. So if you have the charms deploy in this order, you might see:

  • charmA <-- does not get a sidecar
  • charmB <-- does not get a sidecar
  • istio-pilot <-- implements the sidecar mutating webhook
  • charmC <-- gets a sidecar

So we still have the issue where we need istio deployed before everything else.

Another issue with this solution is that we could accidentally give juju's model operator an istio sidecar. iiuc this by itself shouldn't hurt the model operator alone, but if we also had a deny-all default policy in the Kubeflow namespace then we might accidentally block the traffic. We'd need to address this too

Given the race conditions affecting (1) and (3), I wonder if it would make sense to unbundle istio from the rest of Kubeflow. This is a large change, but would have benefits.

Maybe we have:

  • istiod deployed somewhere in the cluster
    • could be managed by an istio charm in a separate model, or a user's own istio install that is not managed by charms
  • istio-integrator charm:
    • configured to use the existing istio (config options to set where istio is, etc)
    • provides things like our ingress relations that we currently get from istio pilot

Then the kubeflow bundle would include istio-integrator rather than istio itself, and deployment would be like:

juju add-model istio-system
juju deploy istio # (the daemon)
# wait

juju add-model kubeflow
juju deploy kubeflow # (includes istio-integrator)

Notes from an internal meeting here

Summary is that we have some design decisions left to make - will report back here when we decide them

This is also affected by whether we are using the istio CNI plugin or not. For example, looking at the output for istioctl kube-inject ..., it includes the istio-init container that sets up the networking. So manual injection must be different depending on whether you're using the istio CNI plugin