fluxcd/flagger

Primary HPA does not get removed automatically after `autoscalerRef` is removed from Canaray

fbuchmeier-abi opened this issue · 1 comments

Describe the bug

We are currently using canaries with autoscalerRef and HPA enabled as described in the docs.

apiVersion: flagger.app/v1beta1
kind: Canary
...
spec:
  autoscalerRef:
    apiVersion: autoscaling/v2beta2
    kind: HorizontalPodAutoscaler
    name: app

This automatically creates a new HPA object app-primary which manages the primary deployment. However, due to not being able to set annotations on the app-primary HPA and actually not using the HPA at all (min = max) we decided to remove the HPA (app) and the autoscalerRef.

In general this works fine, however we noticed that the HPA that was automatically created by flagger (app-primary) does not get deleted automatically.

Is this intended or a bug? In case it is intended, would you think it is significant enough to mention the cleanup process in the docs (delete all remaining .+-primary HPAs that do no longer have a corresponding canary HPA)?

To Reproduce

  1. create a Canary with autoscalerRef enabled. This will create a .+-primary HPA object
  2. remove the Canary HPA and the reference from the kind: Canary object
  3. The .+-primary will still be present and manage the .+-primary deployment, even though your intention is now to manage the replicas of your deployment directly

Expected behavior

Resources that get created automatically when enabling a flag (autoscalerRef) also get deleted when the flag is removed. Otherwise the HPA created by flagger will still manage the amount of replicas for the primary deployment and the user does not have a way to update the amount of replicas without deleting the objects manually.

Additional context

  • Flagger version: ghcr.io/fluxcd/flagger:1.33.0
  • Kubernetes version: eks 1.28
  • Service Mesh provider: linkerd
  • Ingress provider: aws loadbalancer controller (alb)

@fbuchmeier-abi, we were in a similar situation as you, where we had:

  • HPA for the canary target
  • autoscalerRef in the canary object
  • HPA created and owned by Flagger for the primary deployment
  • HPA min = max = replicas
  • canary target (deployment) has replicas removed so Flux and Flagger doesn't fight (when Flux reconciles and when Flagger does a canary analysis).

What we did was:

  • Remove HPA for the canary target
  • Remove autoscalerRef
  • Manage HPA for the primary deployment ourself, this was done by creating a manifest with the same HPA name .+-primary and deployed via Flux
    • As a result, this would patch the HPA for the primary deployment and "transfer" ownership from Flagger to Flux

Not too relevant, we also updated HPA min = replicas and max = Ceil(replicas * 1.5) to provide some flexibility. In our opinion HPA doesn't need to be min=max.