zapier/kubechecks

Question | What is actually being compared by the tool?

Closed this issue · 4 comments

Hi, I hope it’s okay if I ask you a few questions about the tool :) I am part of a team that is currently evaluating whether we should use this tool - However, I am a bit unsure about how reliable the diff is, as it is unclear what exactly is being compared.

Based on this comment, it seems that the tool compares the Applications that exist in the PR with the Applications currently present in the live cluster. In other words, it renders the Applications “locally” based on the files in the PR, then extracts the rendered manifests from ArgoCD (similar to running argocd app manifests on your cluster) and compares the two versions. Is that correct?

To me, this sounds like you are comparing the “desired state” (static configuration in Git) with the “actual state” or “live state” in the cluster. This means you are not actually comparing the PR branch with the main branch, but rather comparing the PR branch with the cluster state.

Is this understanding correct? :)

But what if my applications do not have auto-sync enabled?
What happens if I open a PR where I make a change to an Application that is “not in sync” on my cluster?

Are my PR changes compared to the rendered “not-yet-synced” state (the yellow ghost resources you usually see in the ArgoCD UI before you sync), or are my PR changes compared to the manifests currently managed by ArgoCD (the resources that were applied last time the app was synced)?

I really hope these questions make sense! Finding the right terminology here is tricky. I will happily try to rephrase the questions if they are unclear.

Your understanding is correct, we compare the contents of the PR, rendered the same way argocd would, against the actual kubernetes resources currently being tracked by argocd. This means the ghost "not in sync" resources would not affect the diff; resources not in the cluster would show up as "additions" as far as the diff is concerned.

As a further example of this, consider an application that has auto sync turned on, but has an invalid resource that is rejected by the server (for example, a Deployment with replicas: "one" instead of replicas: 1). The resource is not in the cluster, so a PR submitted to a repo that changes the value from a string to a number would show that it is creating a brand new resource, not modifying an existing one.

This does occasionally cause some strange artifacts, but in practice they only occur (for us at least) when someone is directly editing resources in argocd with auto sync disabled (either permanently or via a sync window), and is in the process of committing those changes back to the repository. They are sort of useful in that situation though, as it lets you know that you've made all the appropriate changes (as the diff would come back and say "No changes detected").

Thank you so much for the detailed answer!

As a further example of this, consider an application that has auto sync turned on, but has an invalid resource that is rejected by the server (for example, a Deployment with replicas: "one" instead of replicas: 1). The resource is not in the cluster, so a PR submitted to a repo that changes the value from a string to a number would show that it is creating a brand new resource, not modifying an existing one.

This is a great example! It’s exactly what I was thinking.

This makes me think of two scenarios:

Let's assume I create a PR where I add a new ArgoCD Application consisting of a single Deployment with auto-sync disabled. The Kubechecks report will highlight the new Deployment.

Now, if I merge the PR and open a follow-up PR where I change a field in that Deployment, I believe that the Kubechecks report again shows that I added the Deployment.

So, as long as the Application is never synced, any subsequent changes to that app will continue showing up as entirely new resources in the report diff. Is that correct?

By default, ArgoCD reconciles with Git every 3 minutes. Let's assume I merge a PR where I set replicas: 10 in a simple application (with auto-sync enabled). If I then branch off the main branch and create a new PR within 3 minutes, there is a chance that the cluster hasn't yet picked up the changes from my previous PR. As a result, the report of the new PR would again show replicas: 10 as a new change, even though that change was part of the previous PR, not the current one.

I understand you'd have to be a bit unlucky for this to happen - but I just want to make sure I understand the edge cases so no one gets confused by the reported diff 👍🏻


Are these two scenarios correct? :)

Yup, both scenarios are entirely correct 👍

Thank you so much @djeebus!