sse-secure-systems/connaisseur

Connaisseur verifies signature with every helm reconcile even with automaticUnchangedApproval enabled

dbbhat opened this issue · 6 comments

dbbhat commented

Describe the bug
We are observing that with every reconcile of HelmReleases in our cluster, connaisseur is verifying image signatures. Our reconcile period is set to 5m and from below logs you can see that the same image signature is being verified every 5m even though there is no update to the image reference in the deployment

{"timestamp": "2023-08-22 02:21:30.484056", "message": "successful verification of image \"<registry>:<tag>\"", "admission_review": {"user": "system:serviceaccount:flux-system:helm-controller", "operation": "UPDATE", "kind": "Deployment", "name": "<foo>", "namespace": "<foo-ns>"}, "image": "<registry>:<tag>"}
{"timestamp": "2023-08-22 02:26:32.649735", "message": "successful verification of image \"<registry>:<tag>\"", "admission_review": {"user": "system:serviceaccount:flux-system:helm-controller", "operation": "UPDATE", "kind": "Deployment", "name": "<foo>", "namespace": "<foo-ns>"}, "image": "<registry>:<tag>"}
{"timestamp": "2023-08-22 02:31:34.994612", "message": "successful verification of image \"<registry>:<tag>\"", "admission_review": {"user": "system:serviceaccount:flux-system:helm-controller", "operation": "UPDATE", "kind": "Deployment", "name": "<foo>", "namespace": "<foo-ns>"}, "image": "<registry>:<tag>"}
{"timestamp": "2023-08-22 02:36:37.282291", "message": "successful verification of image \"<registry>:<tag>\"", "admission_review": {"user": "system:serviceaccount:flux-system:helm-controller", "operation": "UPDATE", "kind": "Deployment", "name": "<foo>", "namespace": "<foo-ns>"}, "image": "<registry>:<tag>"}
apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
  finalizers:
  - finalizers.fluxcd.io
  generation: 11
  name: <foo>
  namespace: flux-system
  resourceVersion: "1178861524"
spec:
  chart:
    spec:
      chart: <foo>
      interval: 5m
      reconcileStrategy: ChartVersion
      sourceRef:
        kind: HelmRepository
        name: <foo>
        namespace: <foo-ns>
      version: 0.0.4
  install:
    crds: CreateReplace
  interval: 5m
  releaseName: <foo>
  upgrade:
    crds: CreateReplace

This is causing unnecessary burden on the cluster and the image registry where signatures are stored because the signatures are being pulled at every helm reconcile for verification.

Expected behavior
Connaisseur should not re-verify the image signature on every helm reconcile period if image references are not updated.

Optional: Versions (please complete the following information as relevant):

  • OS:
  • Kubernetes Cluster: v1.26.2
  • Connaisseur: v3.0.0

Optional: Additional context

Using connaisseur v3.0.0 with automaticChildApproval: true and automaticUnchangedApproval: true.

Hi @dbbhat !

Would flux reconcile a resource regardless of any changes ...?

I'm asking because of the following problem: your remote repo will always be out of sync with the actually deployed, because Connaisseur will mutate the image reference from something like nvcr.io/nv-ngn/nid/nid-operator:main-6c934bb to nvcr.io/nv-ngn/nid/nid-operator:main-6c934bb@sha256:<some-digest> . That is how verifying signatures works and will probably not change. From Connaisseur's perspective we have to do a verification here, since it's a legit request and even though the tag did chnage from before, the underlying digest could have changed.

Hence my question. If flux would realize that a drift from nid-operator:main-6c934bb to nid-operator:main-6c934bb@sha256:<digest> requires no new pushing of the helm release, that would solve this issue. But what i don't know is, if flux would still create an update, no matter if there are changes or not.

Do you have any insight here?

dbbhat commented

Hi @phbelitz, thanks for your response.

It seems like flux is reconciling the resource every time, regardless of the change in image reference. To test this, I updated the image reference in the deployment manifest for the component to include both tag and digest (the mutated form basically), for example <foo>:<tag>@sha256:<some-digest>.

After this update, the connaisseur logs on every reconcile (every 5m) show that the image is being automatically approved because the image reference hasn't changed :

{"timestamp": "2023-08-24 19:31:39.264303", "message": "automatic approval for unchanged image \"<registry>:<tag>@sha256:<digest>\".", "admission_review": {"user": "system:serviceaccount:<foo>", "operation": "UPDATE", "kind": "Pod", "name": "<pod-name-foo>", "namespace": "<foo-ns>"}, "image": "<registry>:<tag>@sha256:<digest>"}
{"timestamp": "2023-08-24 19:35:39.719323", "message": "automatic approval for unchanged image \"<registry>:<tag>@sha256:<digest>\".", "admission_review": {"user": "system:serviceaccount:flux-system:helm-controller", "operation": "UPDATE", "kind": "Deployment", "name": "<foo>", "namespace": "<foo-ns>"}, "image": "<registry>:<tag>@sha256:<digest>"}
{"timestamp": "2023-08-24 19:40:39.936827", "message": "automatic approval for unchanged image \"<registry>:<tag>@sha256:<digest>\".", "admission_review": {"user": "system:serviceaccount:flux-system:helm-controller", "operation": "UPDATE", "kind": "Deployment", "name": "<foo>", "namespace": "<foo-ns>"}, "image": "<registry>:<tag>@sha256:<digest>"}

However, looks like there is something else in the deployment that's responsible for the drift between remote repo and what's actually deployed, causing it to be reconciled every time ...

So should we be using the image digest instead of just the tag then in our deployment manifests?

Yea, that should do the trick for now. The image signatures are still validated, even when you use the digest additionally. Other then that, we should probably brainstorm on how to better support gitops approaches, but I guess that's a future problem 😀

dbbhat commented

The image signatures are still validated, even when you use the digest additionally.

What did you mean by this, can you clarify? I don't see this happening, when I change the image reference to <repository>:<tag>@sha256:<some-digest> , the verification does not happen on every reconcile.

{"timestamp": "2023-08-24 19:31:39.264303", "message": "automatic approval for unchanged image \"<registry>:<tag>@sha256:<digest>\".", "admission_review": {"user": "system:serviceaccount:flux-system:helm-controller", "operation": "UPDATE", "kind": "Pod", "name": "<foo>", "namespace": "<foo-ns>"}, "image": "<registry>:<tag>@sha256:<digest>"}
{"timestamp": "2023-08-24 19:35:39.719323", "message": "automatic approval for unchanged image \"<registry>:<tag>@sha256:<digest>\".", "admission_review": {"user": "system:serviceaccount:flux-system:helm-controller", "operation": "UPDATE", "kind": "Deployment", "name": "<foo>", "namespace": "<foo-ns>"}, "image": "<registry>:<tag>@sha256:<digest>"}
{"timestamp": "2023-08-24 19:40:39.936827", "message": "automatic approval for unchanged image \"<registry>:<tag>@sha256:<digest>\".", "admission_review": {"user": "system:serviceaccount:flux-system:helm-controller", "operation": "UPDATE", "kind": "Deployment", "name": "<foo>", "namespace": "<foo-ns>"}, "image": "<registry>:<tag>@sha256:<digest>"}

Is this because flux no longer considers this image reference to have drift wrt what's in remote repo?

What I mean is that the very first time, a new <repository>:<tag>@sha256:<some-digest> gets deployed, the signature is validated as it should.

All subsequent reconcilations will no longer verify the signature, why: because Connaisseur will skip verification whenever you try to update a resource, without changing the image reference. This is exactly what happens with the flux reconcilation. It tries to reconcile (and thus update the resource), but because you now referenced the image by tag and digest, the image reference between the remote repo and the deployed resource is identical (thus no change here). This is what we internally call automatic update approval

At some point you may update your images, and your references inside the remote repo will change and be different to your deployed resource. Here the signature verification will no longer be skipped, but go through a single time. After that, the automatic update approval will kick in again.

This should now be fixed with the Resource Validation Mode. If set to podsOnly, non-Pod resources are no longer mutated and thus the reconciliation will not trigger there.