Helm Support for Customizing Tf-Runner
thejosephstevens opened this issue · 3 comments
Hey there! I've been working on setting up my flux stack to work without creds, which has been challenging because we have a multi-cloud architecture. I was able to get it working through vanilla OIDC to each cloud by injecting environment variables into my pods (rather than the native approach each cloud supports where the service account is annotated and some hidden hook injects values into the pod). I ran into issues using that approach with the tofu-controller because the TF commands are all run from tf-runner pods, and those pods aren't customizable from Helm (at least not very much, notably env vars, volumes, and volume mounts are not configurable).
I was able to use mutating webhooks from OPA Gatekeeper to inject env vars into the tf-runner pods, but it would be much easier/cleaner if I could just add env vars, volumes, and volume mounts into the values.yaml for the helm chart.
I did take a glance through the code, it looks like the most simple approach would be adding additional CLI flags to tf-controller and passing those all the way through (although that's not a great interface for lists and dictionaries). What I would suggest is using an approach like Spark does and use a pod template. You can take a full podSpec
in values.yaml, toss the object into a configmap mounted to the tf-controller, and then in the tf-controller code merge overrides onto that object. This would allow users to do any configuration of the tf-runner without having to support each field explicitly through Helm.
If there's interest, I could try and find some time to implement this (would likely be at least a few months out). In the meantime, I'm unblocked, so there's no cry for help here (just usability feedback from an excited new user!).
Another approach that would work today would be to build the tf-runner image yourself and bake the environment variables into the container image. Assuming that they do not contain any sensitive information.
Building your own image is something you might consider in the future anyways for caching the terraform providers for resilience, as explained in #321.
That's fine if you've got one controller deployment with one identity (which may actually be the most common deployment model), but it falls apart at two. There's also other things you may want to customize with the tf-runners though (specific affinity for on-demand nodes, labels/annotations to align with prometheus scraping rules or org norms, pod priority, security context...), which currently aren't possible. A pod template would open all of these up in one go.
Just ran into this and verified I missed the override handling here. Looks like you can pass everything through the Terraform object, which would make implementation of a secondary mechanism for passing a spec through Helm a fair amount more involved, or would introduce some non-obvious behaviors (without resorting to breaking changes). Since it's not clear that my proposal would fit well into the existing model (where config is driven by the TF target rather than a shared runner spec), I'll go ahead and close this. If there's future interest in implementing something like this feel free to reach out, as I'd be happy to talk through how I arrived at the approach I described.