Tf-runner falls into CrashLoopBackOff state because of unknown flag --grpc-port for tofu-controller command
eneiss opened this issue · 2 comments
Hello!
I'm having an issue with the tf-runner Pod created by tf-controller (version v0.16.0-rc.4).
I deployed the tf-controller using Flux as a HelmRelease, as specified in the docs.
It can be worth mentioning that all my Helm/Docker registries are internal mirrors, but they all use the public v0.16.0-rc.4 version of the tf-controller Helm chart and Docker images.
When I create a "tf-test" Terraform resource in the git repository targeted by a GitRepository resource inside my cluster hosting Flux, the tf-controller running on it creates a "tf-test-tf-runner" Pod, but this Pod falls into error/CrashLoopBackOff state because of an unknown flag: --grpc-port
error (full error log below).
It seems like the tf-controller is creating a runner Pod with an incorrect CLI flag on the tofu-controller
command, which is explicitly specified as an arg of the tf-runner container (see details below).
Unfortunately I did not find any value to override in the tf-controller Helm chart to prevent this behavior.
Please let me know if I missed something (I'm still new to Flux) or if you need additional details, and thank you for your time :)
Additional information:
tf-test-tf-runner logs with the error:
unknown flag: --grpc-port
Usage of tofu-controller:
--allow-break-the-glass Allow break the glass mode.
--allow-cross-namespace-refs Enable following cross-namespace references. Overrides --no-cross-namespace-
--ca-cert-validity-duration duration The duration that the ca certificate certificates should be valid for. Defau
--cert-rotation-check-frequency duration The interval that the mTLS certificate rotator should check the certificate
--cert-validity-duration duration (Deprecated) The duration that the mTLS certificate that the runner pod shou
--cluster-domain string The cluster domain used by the cluster. (default "cluster.local")
--concurrent int The number of concurrent terraform reconciles. (default 4)
--enable-leader-election Enable leader election for controller manager. Enabling this will ensure the
--events-addr string The address of the events receiver.
--health-addr string The address the health endpoint binds to. (default ":9440")
--http-retry int The maximum number of retries when failing to fetch artifacts over HTTP. (de
--kube-api-burst int The maximum burst queries-per-second of requests sent to the Kubernetes API.
--kube-api-qps float32 The maximum queries-per-second of requests sent to the Kubernetes API. (defa
--leader-election-lease-duration duration Interval at which non-leader candidates will wait to force acquire leadershi
--leader-election-release-on-cancel Defines if the leader should step down voluntarily on controller manager shu
--leader-election-renew-deadline duration Duration that the leading controller manager will retry refreshing leadershi
--leader-election-retry-period duration Duration the LeaderElector clients should wait between tries of actions (dur
--log-encoding string Log encoding format. Can be 'json' or 'console'. (default "json")
--log-level string Log verbosity level. Can be one of 'trace', 'debug', 'info', 'error'. (defau
--metrics-addr string The address the metric endpoint binds to. (default ":8080")
--no-cross-namespace-refs When set to true, references between custom resources are allowed only if th
--requeue-dependency duration The interval at which failing dependencies are reevaluated. (default 30s)
--runner-creation-timeout duration Timeout for creating a runner pod. (default 2m0s)
--runner-grpc-max-message-size int The maximum message size for gRPC connections in MiB. (default 4)
--runner-grpc-port int The port which will be exposed on the runner pod for gRPC connections. (defa
--use-pod-subdomain-resolution Allow to use pod hostname/subdomain DNS resolution instead of IP based
--watch-all-namespaces Watch for custom resources in all namespaces, if set to false it will only w
unknown flag: --grpc-port
tf-test-tf-runner Pod description:
Name: tf-test-tf-runner
Namespace: flux-system
Priority: 0
Service Account: tf-runner
[...]
Labels: app.kubernetes.io/created-by=tf-controller
app.kubernetes.io/instance=tf-runner-c81aeb3f
app.kubernetes.io/name=tf-runner
infra.contrib.fluxcd.io/terraform=flux-system
tf.weave.works/tls-secret-name=terraform-runner.tls-1717010935
[...]
Containers:
tf-runner:
Container ID: containerd://9b449ac3c8b376b15504a2fbc0176b9ee26a717fb30da858c2c6c770ac775731
Image: [INTERNAL-REGISTRY]/flux-iac/tofu-controller:v0.16.0-rc.4
Image ID: [INTERNAL-REGISTRY]/flux-iac/tofu-controller@sha256:850888287bdf3429a8d20e791c74356d4b8210041227c26a70d40b51c0abdf79
Port: 30000/TCP
Host Port: 0/TCP
SeccompProfile: RuntimeDefault
Args:
--grpc-port
30000
--tls-secret-name
terraform-runner.tls-1717010935
--grpc-max-message-size
4
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 2
Started: Tue, 28 May 2024 20:05:46 +0000
Finished: Tue, 28 May 2024 20:05:46 +0000
Versions of CNI, Flux and tf-controller
$ helm list -A
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
cilium kube-system 103 2024-05-28 20:02:09.914929235 +0000 UTC deployed cilium-1.15.5 1.15.5
flux flux-system 131 2024-05-28 20:02:17.205621316 +0000 UTC deployed flux2-2.13.0 2.3.0
flux-system-tf-controller flux-system 2 2024-05-28 19:28:06.002703723 +0000 UTC deployed tf-controller-v0.16.0-rc.4 v0.16.0-rc.4
User-supplied values of the tf-controller Helm chart deployed in my cluster:
allowBreakTheGlass: true
awsPackage:
install: false
caCertValidityDuration: 24h
certRotationCheckFrequency: 30m
concurrency: 8
image:
repository: [INTERNAL-REGISTRY]/flux-iac/tofu-controller
tag: v0.16.0-rc.4
replicaCount: 1
resources:
limits:
cpu: 1000m
memory: 2Gi
requests:
cpu: 400m
memory: 64Mi
runner:
image:
repository: [INTERNAL-REGISTRY]/flux-iac/tofu-controller
tag: v0.16.0-rc.4
Hello @eneiss,
Thank you for the detailed issue it helps a lot! It seems that you are using the tofu-controller image when you should be using the tf-runner image.
Change your values to:
...
runner:
image:
repository: [INTERNAL-REGISTRY]/flux-iac/tf-runner
tag: v0.16.0-rc.4
Please let me know if you are still experiencing issues :)
Oops, nice catch! Thanks a lot for the help :)