sigstore/policy-controller

context deadline exceeded,error when patching "/dev/shm/2383404947

Closed this issue · 1 comments

Description

We are happily running the policy-controller for over 600 namespaces. Currently are facing an issue with an ArgoCD app that has around 30 pods. After syncing we get the following errors:

Version
policy controller: v0.8.2
helm chart: 0.6.8
kubernetes version: v1.29.6-gke.1038001
KMS used: Vault

Policy-controller running with 2 replicas and we don't see any issues with it resource wise.

ArgoCD error

one or more objects failed to apply, reason: error when patching "/dev/shm/2179513921": Internal error occurred: failed calling webhook "policy.sigstore.dev": failed to call webhook: Post "https://webhook.cosign-system.svc:443/validations?timeout=10s": context deadline exceeded,error when patching "/dev/shm/4244754874": Internal error occurred: failed calling webhook "policy.sigstore.dev": failed to call webhook: Post "https://webhook.cosign-system.svc:443/validations?timeout=10s": context deadline exceeded,error when patching "/dev/shm/2383404947": Internal error occurred: failed calling webhook "policy.sigstore.dev": failed to call webhook: Post "https://webhook.cosign-system.svc:443/validations?timeout=10s": context deadline exceeded,error when patching "/dev/shm/1247502162": Internal error occurred: failed calling webhook "policy.sigstore.dev": failed to call webhook: Post "https://webhook.cosign-system.svc:443/validations?timeout=10s": context deadline exceeded,error when patching "/dev/shm/1156143926": Internal error occurred: failed calling webhook "policy.sigstore.dev": failed to call webhook: Post "https://webhook.cosign-system.svc:443/validations?timeout=10s": context deadline exceeded,error when patching "/dev/shm/302217473": Internal error occurred: failed calling webhook "policy.sigstore.dev": failed to call webhook: Post "https://webhook.cosign-system.svc:443/validations?timeout=10s": context deadline exceeded,error when patching "/dev/shm/2355632095": Internal error occurred: failed calling webhook "policy.sigstore.dev": failed to call webhook: Post "https://webhook.cosign-system.svc:443/validations?timeout=10s": context deadline exceeded,error when patching "/dev/shm/634602705": Internal error occurred: failed calling webhook "policy.sigstore.dev": failed to call webhook: Post "https://webhook.cosign-system.svc:443/validations?timeout=10s": context deadline exceeded,error when patching "/dev/shm/3366411207": Internal error occurred: failed calling webhook "policy.sigstore.dev": failed to call webhook: Post "https://webhook.cosign-system.svc:443/validations?timeout=10s": context deadline exceeded (retried 5 times).

Policy controller logs

{"level":"error","ts":"2024-07-29T08:13:02.441Z","logger":"policy-controller","caller":"webhook/validation.go:47","msg":"error validating signatures: Get \"<redacted_container_registry>\": context canceled","commit":"3c52aec-dirty","knative.dev/kind":"apps/v1, Kind=Deployment","knative.dev/namespace":"redacted-namespace","knative.dev/name":"redacted","knative.dev/operation":"UPDATE","knative.dev/resource":"apps/v1, Resource=deployments","knative.dev/subresource":"","knative.dev/userinfo":"system:serviceaccount:argocd:argocd-application-controller","stacktrace":"github.com/sigstore/policy-controller/pkg/webhook.valid\n\tgithub.com/sigstore/policy-controller/pkg/webhook/validation.go:47\ngithub.com/sigstore/policy-controller/pkg/webhook.ValidatePolicySignaturesForAuthority\n\tgithub.com/sigstore/policy-controller/pkg/webhook/validator.go:785\ngithub.com/sigstore/policy-controller/pkg/webhook.ValidatePolicy.func1\n\tgithub.com/sigstore/policy-controller/pkg/webhook/validator.go:531"}

{"level":"warn","ts":"2024-07-29T08:13:02.441Z","logger":"policy-controller","caller":"webhook/validator.go:1156","msg":"Failed to validate at least one policy for <redacted_image_name>@sha256:ad2167ad0083d5b272c30877669dcfa816a432123df3f0c411c248e1a1f746b4 wanted 1 policies, only validated 0","commit":"3c52aec-dirty","knative.dev/kind":"apps/v1, Kind=Deployment","knative.dev/namespace":"redacted-namespace","knative.dev/name":"redacted","knative.dev/operation":"UPDATE","knative.dev/resource":"apps/v1, Resource=deployments","knative.dev/subresource":"","knative.dev/userinfo":"system:serviceaccount:argocd:argocd-application-controller"}

{"level":"error","ts":"2024-07-29T08:13:02.441Z","logger":"policy-controller","caller":"validation/validation_admit.go:183","msg":"Failed the resource specific validation","commit":"3c52aec-dirty","knative.dev/kind":"apps/v1, Kind=Deployment","knative.dev/namespace":"redacted-namespace","knative.dev/name":"redacted","knative.dev/operation":"UPDATE","knative.dev/resource":"apps/v1, Resource=deployments","knative.dev/subresource":"","knative.dev/userinfo":"system:serviceaccount:argocd:argocd-application-controller","stacktrace":"knative.dev/pkg/webhook/resourcesemantics/validation.validate\n\tknative.dev/pkg@v0.0.0-20230612083802-15605c78a270/webhook/resourcesemantics/validation/validation_admit.go:183\nknative.dev/pkg/webhook/resourcesemantics/validation.(*reconciler).Admit\n\tknative.dev/pkg@v0.0.0-20230612083802-15605c78a270/webhook/resourcesemantics/validation/validation_admit.go:79\nknative.dev/pkg/webhook.admissionHandler.func1\n\tknative.dev/pkg@v0.0.0-20230612083802-15605c78a270/webhook/admission.go:123\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2122\nnet/http.(*ServeMux).ServeHTTP\n\tnet/http/server.go:2500\nknative.dev/pkg/webhook.(*Webhook).ServeHTTP\n\tknative.dev/pkg@v0.0.0-20230612083802-15605c78a270/webhook/webhook.go:302\nknative.dev/pkg/network/handlers.(*Drainer).ServeHTTP\n\tknative.dev/pkg@v0.0.0-20230612083802-15605c78a270/network/handlers/drain.go:113\nnet/http.serverHandler.ServeHTTP\n\tnet/http/server.go:2936\nnet/http.(*conn).serve\n\tnet/http/server.go:1995"}

{"level":"info","ts":"2024-07-29T08:13:02.441Z","logger":"policy-controller","caller":"webhook/admission.go:151","msg":"remote admission controller audit annotations=map[string]string(nil)","commit":"3c52aec-dirty","knative.dev/kind":"apps/v1, Kind=Deployment","knative.dev/namespace":"redacted-namespace","knative.dev/name":"redacted","knative.dev/operation":"UPDATE","knative.dev/resource":"apps/v1, Resource=deployments","knative.dev/subresource":"","knative.dev/userinfo":"system:serviceaccount:argocd:argocd-application-controller","admissionreview/uid":"6cd0df49-5095-42b4-8393-4abbe03ebace","admissionreview/allowed":false,"admissionreview/result":"&Status{ListMeta:ListMeta{SelfLink:,ResourceVersion:,Continue:,RemainingItemCount:nil,},Status:Failure,Message:validation failed: context was canceled before validation completed: ,Reason:BadRequest,Details:nil,Code:400,}"}

@hectorj2f would you please take a look. Sorry for directly pinging you.
I've seen your comment on #952 and since the OP didn't provide any I thought would be okay to ping.

After going through Slack threads, it kind of seems to be related to the timeout of the webhook.
I've increased it from 10s to 30s. The slack threads : https://sigstore.slack.com/archives/C03096V09F1/p1657724579691239 and https://sigstore.slack.com/archives/C03096V09F1/p1695225677749929?thread_ts=1695217924.133089&cid=C03096V09F1