Splunk Operator: add scale subresource
yaroslav-nakonechnikov opened this issue · 6 comments
Please select the type of request
Feature Request
Tell us more
Describe the request
In order to start using keda (https://keda.sh/docs/2.11/concepts/scaling-deployments/#scaling-of-custom-resources), which will help a lot for testing/development stack, need to add support for https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions/#scale-subresource
Expected behavior
scale subresource added and it is possible to use keda natively
Hey @yaroslav-nakonechnikov , we are able to scale custom resources using replicas as mentioned here. Can you try using the same?
@akondur it is different thing. if you open https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions/#scale-subresource, you will see explanation about advanced usage of replicas.
simple replicas
can be used, but with workarounds and writing custom scripts.
Hey @yaroslav-nakonechnikov , the operator CRDs have the scale subresource already embedded in. Couple of code references:
With operator version 2.5.1 and v4 CRDs deployed on an EKS cluster:
bash% k get crds/standalones.enterprise.splunk.com -o yaml | grep -i scale: -A 3
scale:
labelSelectorPath: .status.selector
specReplicasPath: .spec.replicas
statusReplicasPath: .status.replicas
--
scale:
labelSelectorPath: .status.selector
specReplicasPath: .spec.replicas
statusReplicasPath: .status.replicas
I tried autoscaling using Keda using the following steps:
- Install keda using instructions from here.
- Use the scaledobject spec below to target a standalone resource
bash% cat ~/keda_scaler.yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: keda-sa
namespace: splunk-operator
spec:
scaleTargetRef:
apiVersion: enterprise.splunk.com/v4
kind: Standalone
name: example
pollingInterval: 5 # Optional. Default: 30 seconds
cooldownPeriod: 10 # Optional. Default: 300 seconds
idleReplicaCount: 0 # Optional. Default: ignored, must be less than minReplicaCount
minReplicaCount: 1 # Optional. Default: 0
maxReplicaCount: 100 # Optional. Default: 100
advanced: # Optional. Section to specify advanced options
triggers:
- type: cpu
#metricType: Utilization # Allowed types are 'Utilization' or 'AverageValue'
metadata:
type: Utilization # Deprecated in favor of trigger.metricType; allowed types are 'Utilization' or 'AverageValue'
value: "5"
Once deployed:
bash% kubectl describe scaledobject Creation Timestamp: 2024-02-27T01:17:32Z
Finalizers:
finalizer.keda.sh
Generation: 1
Resource Version: 13949
UID: 63fae67f-e2df-4379-8e1a-7661ec5b0179
Spec:
Cooldown Period: 10
Idle Replica Count: 0
Max Replica Count: 100
Min Replica Count: 1
Polling Interval: 5
Scale Target Ref:
API Version: enterprise.splunk.com/v4
Kind: Standalone
Name: example
Triggers:
Metadata:
Type: Utilization
Value: 5
Type: cpu
Status:
Conditions:
Message: ScaledObject is defined correctly and is ready for scaling
Reason: ScaledObjectReady
Status: True
Type: Ready
Message: Scaling is performed because triggers are active
Reason: ScalerActive
Status: True
Type: Active
Status: Unknown
Type: Fallback
Status: Unknown
Type: Paused
Hpa Name: keda-hpa-keda-sa
Last Active Time: 2024-02-27T01:43:57Z
Original Replica Count: 1
Resource Metric Names:
cpu
Scale Target GVKR:
Group: enterprise.splunk.com
Kind: Standalone
Resource: standalones
Version: v4
Scale Target Kind: enterprise.splunk.com/v4.Standalone
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal KEDAScalersStarted 26m keda-operator Scaler cpu is built.
Normal KEDAScalersStarted 26m keda-operator Started scalers watch
Normal ScaledObjectReady 26m keda-operator ScaledObject is ready for scaling
Corresponding HPA where its able to find the scale subresource and hook onto it:
bash % k describe hpa
Name: keda-hpa-keda-sa
Namespace: splunk-operator
Labels: app.kubernetes.io/managed-by=keda-operator
app.kubernetes.io/name=keda-hpa-keda-sa
app.kubernetes.io/part-of=keda-sa
app.kubernetes.io/version=2.13.0
scaledobject.keda.sh/name=keda-sa
Annotations: <none>
CreationTimestamp: Mon, 26 Feb 2024 19:17:32 -0600
Reference: Standalone/example
Metrics: ( current / target )
resource cpu on pods (as a percentage of request): <unknown> / 5%
Min replicas: 1
Max replicas: 100
Standalone pods: 1 current / 0 desired
Conditions:
Type Status Reason Message
---- ------ ------ -------
AbleToScale True SucceededGetScale the HPA controller was able to get the target's current scale
From the results above looks like the scale subresource is working. Is there an error you are seeing when deploying with keda? Does deploying a hpa like below work for you?
kind: HorizontalPodAutoscaler
metadata:
name: sa-hpa
namespace: splunk-operator
spec:
scaleTargetRef:
apiVersion: enterprise.splunk.com/v4
kind: Standalone
name: example
minReplicas: 1
maxReplicas: 10
targetCPUUtilizationPercentage: 1
super, thanks. this is very helpful!
@akondur i finally rechecked this keda and subresources, but it don't work as expected...
for example:
[yn@ip-10-216-35-48 ~]$ k describe hpa -n splunk-operator
Name: keda-hpa-keda-sa
Namespace: splunk-operator
Labels: app.kubernetes.io/managed-by=keda-operator
app.kubernetes.io/name=keda-hpa-keda-sa
app.kubernetes.io/part-of=keda-sa
app.kubernetes.io/version=2.13.1
scaledobject.keda.sh/name=keda-sa
Annotations: autoscaling.alpha.kubernetes.io/conditions:
[{"type":"AbleToScale","status":"True","lastTransitionTime":"2024-03-15T16:33:41Z","reason":"ReadyForNewScale","message":"recommended size...
autoscaling.alpha.kubernetes.io/current-metrics:
[{"type":"Resource","resource":{"name":"cpu","currentAverageUtilization":100,"currentAverageValue":"2002m"}}]
CreationTimestamp: Fri, 15 Mar 2024 16:33:26 +0000
Reference: IndexerCluster/site6-32002
Target CPU utilization: 500%
Current CPU utilization: 100%
Min replicas: 1
Max replicas: 3
IndexerCluster pods: 2 current / 2 desired
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulRescale 37m (x43 over 2d11h) horizontal-pod-autoscaler New size: 1; reason: All metrics below target
Normal SuccessfulRescale 37m (x44 over 2d11h) horizontal-pod-autoscaler New size: 2; reason: cpu resource utilization (percentage of request) above target
[yn@ip-10-216-35-48 ~]$ kubectl get pods -n splunk-operator
NAME READY STATUS RESTARTS AGE
splunk-32002-cluster-manager-0 1/1 Running 74 (3d22h ago) 4d4h
splunk-32002-license-manager-0 1/1 Running 1 (2d5h ago) 2d5h
splunk-32002-monitoring-console-0 1/1 Running 0 3d22h
splunk-c-32002-standalone-0 1/1 Running 0 3d9h
splunk-e-32002-deployer-0 1/1 Running 1 (3d4h ago) 3d5h
splunk-e-32002-search-head-0 1/1 Running 0 2d4h
splunk-e-32002-search-head-1 1/1 Running 0 3d5h
splunk-e-32002-search-head-2 1/1 Running 0 3d5h
splunk-operator-controller-manager-667fff5754-vjxd5 2/2 Running 1 (3d22h ago) 4d5h
splunk-site6-32002-indexer-0 1/1 Running 5 (3d3h ago) 3d4h
splunk-site6-32002-indexer-1 1/1 Running 0 38h
splunk-site6-32002-indexer-2 1/1 Running 0 2d21h
splunk-site6-32002-indexer-3 1/1 Running 0 2d21h
splunk-site6-32002-indexer-4 1/1 Running 0 41h
splunk-site6-32002-indexer-5 0/1 Running 249 (44s ago) 40h
so i expect to see only 2 indexers.
i've deleted 4:
[yn@ip-10-216-35-48 ~]$ kubectl delete pods -n splunk-operator splunk-site6-32002-indexer-2 splunk-site6-32002-indexer-3 splunk-site6-32002-indexer-4 splunk-site6-32002-indexer-5
pod "splunk-site6-32002-indexer-2" deleted
pod "splunk-site6-32002-indexer-3" deleted
pod "splunk-site6-32002-indexer-4" deleted
pod "splunk-site6-32002-indexer-5" deleted
[yn@ip-10-216-35-48 ~]$ kubectl get pods -n splunk-operator
NAME READY STATUS RESTARTS AGE
splunk-32002-cluster-manager-0 1/1 Running 74 (3d22h ago) 4d4h
splunk-32002-license-manager-0 1/1 Running 1 (2d5h ago) 2d5h
splunk-32002-monitoring-console-0 1/1 Running 0 3d22h
splunk-c-32002-standalone-0 1/1 Running 0 3d9h
splunk-e-32002-deployer-0 1/1 Running 1 (3d4h ago) 3d5h
splunk-e-32002-search-head-0 1/1 Running 0 2d4h
splunk-e-32002-search-head-1 1/1 Running 0 3d5h
splunk-e-32002-search-head-2 1/1 Running 0 3d5h
splunk-operator-controller-manager-667fff5754-vjxd5 2/2 Running 1 (3d22h ago) 4d5h
splunk-site6-32002-indexer-0 1/1 Running 5 (3d3h ago) 3d4h
splunk-site6-32002-indexer-1 1/1 Running 0 38h
splunk-site6-32002-indexer-2 0/1 Running 0 3s
splunk-site6-32002-indexer-3 0/1 Running 0 3s
splunk-site6-32002-indexer-4 0/1 Running 0 3s
splunk-site6-32002-indexer-5 0/1 Running 0 3s
and these 4 again were recreated.
why, if hpa says that only 2 is needed?
ok, looks like this is related to #1293
as i see that splunk-operator can't create more (and less) replicas even manually editing crd.