authzed/spicedb-operator

Provide a method for attaching sidecars (or patching both the Deployment and Job simultaneously)

Opened this issue · 4 comments

jawnsy commented

Summary

Provide some means of attaching a sidecar to both the Deployment and Job.

Background

Google Cloud SQL supports encryption and IAM authentication using the Cloud SQL Proxy service, running as a sidecar container.

The recommended deployment methodology is to use a sidecar container, because the proxy does not support authentication (anyone connecting to the proxy inherits the credentials that the proxy can access, so using a sidecar is the safest way to ensure that only authorized workloads can connect through the proxy).

Workarounds

  • I think this is doable with patches to both the Deployment and Job, but it is a little bit tedious, because it has to be written twice (once to patch the cluster Deployment, and again to patch the migration Job)
  • If using a connection pooler (e.g. PgBouncer), then SpiceDB can connect to the pooler (with authentication) and the pooler can forward connections to the proxy (running as a sidecar)
  • We can run the proxy as an independent Deployment and use a NetworkPolicy to restrict access, but this is risky as not all CNI plugins will enforce NetworkPolicy
jawnsy commented

Even if you patch things, the migration job does not quite work correctly, because:

  1. The migration container always expects the database to be accessible, but when the Cloud SQL Auth Proxy is starting up, it will not be ready yet - a solution is for the migration service to retry every few seconds until it succeeds (though Kubernetes will detect the migration container as "crashed" and restart it anyways)
  2. The proxy container will still be running after the migration container exits, so the job will not complete

So, for now, perhaps the best option is to use a username/password for database authentication.

This isn't an option for most kube clusters in the wild just yet, but https://kubernetes.io/docs/concepts/workloads/pods/init-containers/#api-for-sidecar-containers I think would at least make patching the job work for this. Any chance you're on a cluster that you can enable Alpha features on?

The proxy container will still be running after the migration container exits, so the job will not complete

This is interesting. The https://github.com/GoogleCloudPlatform/cloud-sql-proxy-operator supports injecting into Jobs, but I don't see how anyone can use that feature.

A hacky way could be a timeout on the sql proxy pod? Give the proxy 1 minute to run migrations and then exit 0 so that the migration container controls overall job success (but if your data gets very large you might need to play with that number).

I did find this writeup: https://medium.com/teamsnap-engineering/properly-running-kubernetes-jobs-with-sidecars-ddc04685d0dc which suggests sharing the process namespace between the pods and killing the proxy process when the primary pod completes. That's an option but seems like a lot of work to replace a thing that's already built-in to newer versions of kube.

If it helps at all, we use cloud-sql-proxy sidecars on various migrations, they use a quitquitquit standard (which is becoming more commonly used) as a way of issuing a SIGTERM to the container once the migration finishes.

In our case as an example, the sidecar container for cloud-sql-proxy looks like:

       - name: cloud-sql-proxy
          image: gcr.io/cloud-sql-connectors/cloud-sql-proxy:2.1.1
          args:
          <snip>
          - "--quitquitquit"

And then on the service (which is setup with Helm)

      automigration:
        enabled: true
        customCommand: [/bin/sh, -c]
        customArgs:
        - migrate sql -e --yes; wget --post-data '{}' -O /dev/null -q http://127.0.0.1:9091/quitquitquit

(Only wget is available in this container, not ideal, but working with what we have avaliable)

@jawnsy @adamstrawson

In kube 1.29+ the sidecar containers feature is enabled by default. Have either of you successfully tried the cloud sql proxy with this?