Optimization is not working - Azure AKS - v1.25.6
Opened this issue · 19 comments
Hi Team,
First of all, it looks like a new tool and it can play an important role as well.
I just quickly tested it in Azure AKS v1.25.6. Below are my findings/comments:
- First, a small correction in the helm install command - We should use the name as well while installing.
helm install kube-reqsizer/kube-reqsizer --> helm install kube-reqsizer kube-reqsizer/kube-reqsizer
-
I've deployed a basic application in the default namespace with high CPU/memory requests to test, whether kube-reqsizer will optimize or not. Waited for 22 mins, but still, it was the same.
-
Logs FYR
I0530 15:58:39.252063 1 request.go:601] Waited for 1.996392782s due to client-side throttling, not priority and fairness, request: GET:https://10.0.0.1:443/api/v1/namespaces/argocd
I0530 15:58:49.252749 1 request.go:601] Waited for 1.995931495s due to client-side throttling, not priority and fairness, request: GET:https://10.0.0.1:443/api/v1/namespaces/argocd
I0530 15:58:59.450551 1 request.go:601] Waited for 1.994652278s due to client-side throttling, not priority and fairness, request: GET:https://10.0.0.1:443/api/v1/namespaces/argocd
I0530 15:59:09.450621 1 request.go:601] Waited for 1.994074539s due to client-side throttling, not priority and fairness, request: GET:https://10.0.0.1:443/api/v1/namespaces/kube-system
I0530 15:59:19.450824 1 request.go:601] Waited for 1.99598317s due to client-side throttling, not priority and fairness, request: GET:https://10.0.0.1:443/api/v1/namespaces/kubescape
I0530 15:59:29.650328 1 request.go:601] Waited for 1.993913908s due to client-side throttling, not priority and fairness, request: GET:https://10.0.0.1:443/api/v1/namespaces/tigera-operator
I0530 15:59:39.650831 1 request.go:601] Waited for 1.996110718s due to client-side throttling, not priority and fairness, request: GET:https://10.0.0.1:443/api/v1/namespaces/kubescape
I0530 15:59:49.850897 1 request.go:601] Waited for 1.995571438s due to client-side throttling, not priority and fairness, request: GET:https://10.0.0.1:443/api/v1/namespaces/kube-system
I0530 16:00:00.049996 1 request.go:601] Waited for 1.994819712s due to client-side throttling, not priority and fairness, request: GET:https://10.0.0.1:443/api/v1/namespaces/calico-system
I0530 16:00:10.050864 1 request.go:601] Waited for 1.991681441s due to client-side throttling, not priority and fairness, request: GET:https://10.0.0.1:443/api/v1/namespaces/default
-
How much time it will take to optimize? Will it restart the pod automatically?
-
I haven't customized any values, just used the below commands to install.
helm repo add kube-reqsizer https://jatalocks.github.io/kube-reqsizer/
helm repo update
helm install kube-reqsizer kube-reqsizer/kube-reqsizer
Hey @zohebk8s , thanks for trying out the tool.
I've seen this occur to different people, and it seems like the kubeapi is too slow for the default configuration of the chart. For this, you need to set concurrentWorkers to 1.
This issue had the same problem as yours. Please see the correspondence here:
Thanks! And tell me how it went
@jatalocks Thanks for your response.
I've updated the concurrentWorkers to "1" and the value of min-seconds in kube-reqsizer is also "1" as shown below: But still it's not updating the values. I am I missing something here?
I've added below annotations for that deployment:
apiVersion: apps/v1
kind: Deployment
metadata:
name: app-deployment
annotations:
reqsizer.jatalocks.github.io/optimize: "true" # Ignore Pod/Namespace when optimizing entire cluster
reqsizer.jatalocks.github.io/mode: "average" # Default Mode. Optimizes based on average. If ommited, mode is average
reqsizer.jatalocks.github.io/mode: "max" # Sets the request to the MAXIMUM of all sample points
reqsizer.jatalocks.github.io/mode: "min" # Sets the request to the MINIMUM of all sample points
Hey @zohebk8s , can you send a screenshot of the logs now? (A few minutes after the controller has started working). It might take for it some minutes to resize
Also, try adding the "optimize" annotation to the namespace this deployment is in
I've added annotation to the default namespace, where this deployment is running. But still, the values are the same
kube-reqsizer-controller-manager-795bbd7677-dl4xx-logs.txt
and they didn't change.
The utilization of the pods is very normal and I was expecting a change/optimization from kube-reqsizer. In the request, I've specified below values:
cpu: "100m"
memory: 400Mi
I've attached the full log file FYR. Please refer attached txt file
@zohebk8s it appears it's working. If you gave it time through the night, did it eventually work? It might take some time on concurrentWorkers=1 but eventually it has enough data in cache to make the decision.
That's odd, it should have worked immediately. I think something prevents it from allowing it to resize. What's your values/configuration? You should make sure minSeconds=1 and sampleSize=1 as well.
The configuration should match what's on the top of the Readme (except concurrentWorkers=1)
Already it's "1" for concurrent-workers, minSeconds & sampleSize.
It's Azure AKS - v1.25.6 and the default namespace is istio injected. I hope it's not something specific to Istio.
configuration:
spec:
containers:
- args:
- --health-probe-bind-address=:8081
- --metrics-bind-address=:8080
- --leader-elect
- --annotation-filter=true
- --sample-size=1
- --min-seconds=1
- --zap-log-level=info
- --enable-increase=true
- --enable-reduce=true
- --max-cpu=0
- --max-memory=0
- --min-cpu=0
- --min-memory=0
- --min-cpu-increase-percentage=0
- --min-memory-increase-percentage=0
- --min-cpu-decrease-percentage=0
- --min-memory-decrease-percentage=0
- --cpu-factor=1
- --memory-factor=1
- --concurrent-workers=1
- --enable-persistence=true
- --redis-host=kube-reqsizer-redis-master
What are resource requirements for the deployments in default namespace? The only thing I could think of is that it doesn't have anything to resize so it just continues sampling the pods. Also if there are no requests/limits to begin with there's nothing to resize from. I'd check that the pods are configured with resources
I've defined requests/limits for this deployment and the utilization is very less, that's the reason I thought of raising this question/issue.
If it doesn't have requests/limits, then as you said it won't work. But in case, I've defined requests/limits and the CPU/memory utilization is very less as well.
I see that reqsizer is alive for 11 minutes. I'd give it some more time for now and I'll check if there's a specific problem with AKS
@jatalocks Thank you for your patience and response, as I feel like this product can make a difference if it works properly. As it's more related to resource optimation which is directly proportional to cost optimization.
@jatalocks Is it a bug? Or kind of enhancement required at product level?
I hope the information which I’ve shared is of help.
@zohebk8s I think that by now if the controller has been continuously running the app should have already been resized
@ElementTech i see that @zohebk8s seem to be using argo-cd in this cluster, can be that argo-cd is directly undoing all the changes done on the resources of the Deployment?