KEDA Scaled Object Primary creation failed due to workload already managed by the hpa
jdgeisler opened this issue · 2 comments
Describe the bug
Following the flagger docs for keda integration, I updated the canary to point to the KEDA scaled object instead of the HPA. Additionally, I added the annotation in the keda scaled object so it takes over ownership of the old HPA. Flagger then goes to create the primary scaledObject from the scaledObject, however since the old primary HPA is not cleaned up by Flagger when the original HPA was removed, it still exists and causes the below error due to the collision.
{"level":"info","ts":"2024-05-16T14:18:12.738Z","caller":"controller/events.go:45","msg":"creating Keda ScaledObject fortio-server-deployment-2-primary.fortio failed: admission webhook \"vscaledobject.kb.io\" denied the request: the workload 'fortio-server-deployment-2-primary' of type 'apps/v1.Deployment' is already managed by the hpa 'fortio-server-deployment-2-primary'","canary":"fortio-server-deployment-2.fortio"}
The behavior I would expect in Flagger when migrating from an HPA to a Scaled Object in the Canary is to check if the primary from the hpa still exists. If it does exist, it should add the same annotation in the keda scaled object to the new primary scaled object, so instead of colliding with the already existing hpa, it takes over ownership of it.
I tried various workarounds as well, such as manually creating the primary scaledObject and taking ownership of the already existing primary hpa, but then on a canary analysis run Flagger reconciles and updates the primary scaledObject to remove the ownership annotation, so it then creates the duplicate scaledObject anyway.
I dug into the code and found where the scaledObject primary reconciliation happens, and I think it would be relatively straight forward to implement the above logic to add the keda transfer ownership annotation to the primary scaled object.
Without this implemented, it seems I am unable to migrate from an HPA to a Scaled Object in a canary unless we do a multiple step process to manually delete the old primary HPA so the new primary scaled object can be created.
To Reproduce
- Have an already existing Canary referencing an HPA
- Replace the HPA with the scaled object and then update the autoscalerRef in the canary to point to the scaled object
fortio, fortio-server-deployment-2, Canary (flagger.app) has changed:
# Source: fortio/templates/fortio.yaml
apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
name: fortio-server-deployment-2
namespace: fortio
spec:
analysis:
interval: 30s
maxWeight: 50
metrics:
- interval: 30s
name: request-success-rate
thresholdRange:
min: 99
stepWeight: 10
threshold: 10
webhooks:
- metadata:
async: "on"
c: "6"
labels: fortio-server-deployment-2
load: Start
nocatchup: "on"
qps: "12"
save: "on"
stdclient: "on"
t: 600s
uniform: "on"
url: http://fortio-server-deployment-2-canary.fortio.svc.cluster.local:8080/
name: generateLoad
timeout: 60s
type: pre-rollout
url: http://fortio-client.fortio.svc.cluster.local:8080/fortio/rest/run?jsonPath=.metadata
progressDeadlineSeconds: 600
autoscalerRef:
- apiVersion: autoscaling/v2
- kind: HorizontalPodAutoscaler
+ apiVersion: keda.sh/v1alpha1
+ kind: ScaledObject
name: fortio-server-deployment-2
targetRef:
apiVersion: apps/v1
kind: Deployment
name: fortio-server-deployment-2
service:
port: 8080
targetPort: 8080
fortio, fortio-server-deployment-2, HorizontalPodAutoscaler (autoscaling) has been removed:
- # Source: fortio/templates/fortio.yaml
- apiVersion: autoscaling/v2
- kind: HorizontalPodAutoscaler
- metadata:
- name: fortio-server-deployment-2
- namespace: fortio
- spec:
- maxReplicas: 4
- metrics:
- - resource:
- name: cpu
- target:
- averageUtilization: 99
- type: Utilization
- type: Resource
- minReplicas: 2
- scaleTargetRef:
- apiVersion: apps/v1
- kind: Deployment
- name: fortio-server-deployment-2
+
fortio, fortio-server-deployment-2, ScaledObject (keda.sh) has been added:
-
+ # Source: fortio/templates/fortio.yaml
+ apiVersion: keda.sh/v1alpha1
+ kind: ScaledObject
+ metadata:
+ name: fortio-server-deployment-2
+ namespace: fortio
+ annotations:
+ scaledobject.keda.sh/transfer-hpa-ownership: "true"
+ spec:
+ advanced:
+ horizontalPodAutoscalerConfig:
+ name: fortio-server-deployment-2
+ scaleTargetRef:
+ name: fortio-server-deployment-2
+ minReplicaCount: 2
+ maxReplicaCount: 4
+ triggers:
+ - type: prometheus
+ metadata:
+ serverAddress: http://kube-prometheus-stack-prometheus:9090/prometheus
+ threshold: '100'
+ query: sum(rate(istio_requests_total{destination_service_name="fortio-server-2"}[1m]))
+ - type: cpu
+ metricType: Utilization
+ metadata:
+ value: "99"
- See that the scaled object is created and takes ownership of the original HPA. However, Flagger then fails to create the new primary scaled object due to the collision in ownership of the primary HPA already existing. In the below screenshot you can see the primary is not updated to use the custom metric in the keda scaled object
k get pods
NAME READY STATUS RESTARTS AGE
fortio-client-deployment-75c6f57dc4-jws2f 2/2 Running 0 14d
fortio-http-proxy-parallel-b556bc8dc-6ffc4 2/2 Running 0 14d
fortio-http-proxy-serial-7d54446cd-44wml 2/2 Running 0 14d
fortio-server-deployment-1-7d6fd57d76-26bzd 2/2 Running 0 14d
fortio-server-deployment-1-7d6fd57d76-rrscw 2/2 Running 0 14d
fortio-server-deployment-2-88c7fb48b-dg2l4 2/2 Running 0 8m50s
fortio-server-deployment-2-88c7fb48b-zm5tn 2/2 Running 0 8m50s
fortio-server-deployment-2-primary-6d9fb88898-8rrrn 2/2 Running 0 24m
fortio-server-deployment-2-primary-6d9fb88898-z4bdb 2/2 Running 0 24m
fortio-tcp-proxy-6f566d4cfc-qzsgq 2/2 Running 0 14d
k get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
fortio-server-deployment-2 Deployment/fortio-server-deployment-2 0/100 (avg), 4%/99% 2 4 2 8m55s
fortio-server-deployment-2-primary Deployment/fortio-server-deployment-2-primary 3%/99% 2 4 2 24m
k get scaledobject
NAME SCALETARGETKIND SCALETARGETNAME MIN MAX TRIGGERS AUTHENTICATION READY ACTIVE FALLBACK PAUSED AGE
fortio-server-deployment-2 apps/v1.Deployment fortio-server-deployment-2 2 4 prometheus True False False Unknown 9m1s
- See the below error of the primary scaled object failing to create
{"level":"info","ts":"2024-05-16T14:18:12.738Z","caller":"controller/events.go:45","msg":"creating Keda ScaledObject fortio-server-deployment-2-primary.fortio failed: admission webhook \"vscaledobject.kb.io\" denied the request: the workload 'fortio-server-deployment-2-primary' of type 'apps/v1.Deployment' is already managed by the hpa 'fortio-server-deployment-2-primary'","canary":"fortio-server-deployment-2.fortio"}
- Ideally, when the primary scaled object is created by flagger, it should have the below annotation to take ownership of the primary HPA if it already exists. Otherwise it would be impossible to migrate from an hpa to a scaled object without manually deleting the old primary hpa first.
annotations:
scaledobject.keda.sh/transfer-hpa-ownership: "true"
spec:
advanced:
horizontalPodAutoscalerConfig:
name: fortio-server-deployment-2-primary
Expected behavior
This is described above in the description and steps to reproduce.
The behavior I would expect in Flagger when migrating from an HPA to a Scaled Object in the Canary is to check if the primary from the hpa still exists. If it does exist, it should add the annotation in the keda scaled object to the new primary scaled object, so instead of colliding with the already existing hpa, it takes over ownership of it.
Additional context
- Flagger version: 1.34.0
- Kubernetes version: 1.27
- Service Mesh provider: Istio 1.19.7
I think a possible solution could roughly look like the following:
When the primary scaled object is created by Flagger, I think the annotations from the target scaled object that is cloned should also be included. This way, if the scaledobject.keda.sh/transfer-hpa-ownership
annotation is added to the target scaled object, then it will also be added to the primary scaled object. This would prevent the collision between the old hpa and the new hpa being created.
primarySo = &keda.ScaledObject
// Passing in the annotations from the targetSo so that they are carried over to the primarySo. This is required so that the transfer ownership annotation can be added.
ObjectMeta: makeObjectMetaSo(primarySoName, targetSoClone.Labels, targetSoClone.Annotations, cd),
Spec: soSpec,
}
Additionally, the Advanced.HorizontalPodAutoscalerConfig.Name
in the keda.ScaledObjectSpec
should also be configurable and set to the primary name in this case. Otherwise, it will have the default keda-hpa
prefix. This is detailed in the KEDA documentation here.
// Set the Name of the horizontal pod autoscaler to whatever the primary name should be, otherwise the hpa name won't match the scaled object name
Advanced: &keda.AdvancedConfig{
RestoreToOriginalReplicaCount: targetSoClone.Spec.Advanced.RestoreToOriginalReplicaCount,
HorizontalPodAutoscalerConfig: &keda.HorizontalPodAutoscalerConfig{
Name: primaryName,
Behavior: targetSoClone.Spec.Advanced.HorizontalPodAutoscalerConfig.Behavior,
},
},
With these changes, when the following ScaledObject is created and the Canary is being updated to reference a ScaledObject instead of an HPA, instead of failing to create the new primary hpa since it already exists, it would instead take ownership of it.
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: fortio-server-deployment-2
namespace: fortio
annotations:
scaledobject.keda.sh/transfer-hpa-ownership: "true"
spec:
advanced:
horizontalPodAutoscalerConfig:
name: fortio-server-deployment-2
scaleTargetRef:
name: fortio-server-deployment-2
minReplicaCount: 2
maxReplicaCount: 4
triggers:
- type: prometheus
metadata:
serverAddress: http://kube-prometheus-stack-prometheus:9090/prometheus
threshold: '100'
query: sum(rate(istio_requests_total{destination_service_name="fortio-server-2"}[1m]))
- type: cpu
metricType: Utilization
metadata:
value: "99"
The above annotation from the target scaled object would be added to the primary scaled object and it would reference the correct fortio-server-deployment-2-primary
hpa.
Also seeing this same behavior when switching the autoscalerRef
from HPA to Scaled Objects