Inquiry Regarding Celery Worker Scaling Issue
Opened this issue · 1 comments
Hello,
I'm reaching out to seek guidance regarding an issue with the scaling of my worker. It appears that the worker hasn't scaled as expected, and I'm uncertain whether there might be a misconfiguration causing this. Currently, my setup is on EKS Kubernetes version 1.21, using Keda version 2.8.2. Below is the YAML configuration I'm using:
apiVersion: apps/v1
kind: Deployment
metadata:
name: keda-celery
labels:
component: keda-celery
spec:
replicas: 1
selector:
matchLabels:
component: keda-celery
template:
metadata:
labels:
component: keda-celery
spec:
containers:
- name: keda-celery
image: "jerbob92/keda-celery-scaler:latest"
env:
- name: KCS_LOG_LEVEL
value: "info"
- name: KCS_WORKER_STALE_TIME
value: "10"
- name: KCS_WORKER_CLEANUP_INTERVAL
value: "5"
- name: KCS_WORKER_QUEUE_MAP
value: "internal-queue-worker:internal_queue"
- name: KCS_REDIS_TYPE
value: "standalone"
- name: KCS_REDIS_SERVER
value: "santika-api-redis-svc.default.svc.cluster.local:6379"
ports:
- containerPort: 6000
name: keda-celery
resources:
limits:
cpu: "500m"
memory: "512Mi"
requests:
cpu: "100m"
memory: "256Mi"
---
apiVersion: v1
kind: Service
metadata:
name: keda-celery
spec:
ports:
- port: 6000
protocol: TCP
name: keda-celery
selector:
component: keda-celery
---
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: internal-queue-worker
spec:
scaleTargetRef:
name: internal-queue-worker
pollingInterval: 5
cooldownPeriod: 300
minReplicaCount: 1
maxReplicaCount: 15
triggers:
- type: external
metadata:
scalerAddress: "keda-celery.default.svc.cluster.local:6000"
queue: "internal_queue"
scaleLoadValue: "50" # Scale at a load of 70%.
metricType: Value
---
And here is my celery worker yaml
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: internal-queue-worker
restart-every: 30m
name: internal-queue-worker
namespace: default
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app: internal-queue-worker
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
labels:
app: internal-queue-worker
name: santika-celery
version: v0.1.982
spec:
containers:
- command:
- celery
- -A
- santika
- worker
- -l
- info
- -Q
- internal_queue
- -c
- "5"
envFrom:
- configMapRef:
name: env-configmap
- secretRef:
name: santika-api
image: XXXXX
imagePullPolicy: IfNotPresent
name: santika-celery
resources:
limits:
cpu: 500m
memory: 1148Mi
requests:
cpu: 250m
memory: 350Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /santika/static
name: static-file
- mountPath: /santika/media
name: media-file
dnsPolicy: ClusterFirst
imagePullSecrets:
- name: gcr-json-key
nodeSelector:
lifecycle: spot
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
volumes:
- name: media-file
persistentVolumeClaim:
claimName: santikamedia-volumeclaim
- name: static-file
persistentVolumeClaim:
claimName: santikastatic-volumeclaim
I've ensured that both the pod and Celery are running smoothly. As evidence, I've attached screenshots from my local Celery flower and Kubernetes pod:
![Screenshot 2023-12-28 122557](https://private-user-images.githubusercontent.com/123814359/293143687-1b52358a-c32f-4384-b647-ab8ce9e9ba97.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjIyNTUxNjMsIm5iZiI6MTcyMjI1NDg2MywicGF0aCI6Ii8xMjM4MTQzNTkvMjkzMTQzNjg3LTFiNTIzNThhLWMzMmYtNDM4NC1iNjQ3LWFiOGNlOWU5YmE5Ny5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNzI5JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDcyOVQxMjA3NDNaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT1mMDY0YWUyY2M1OWNiY2EwMzg3OWEwOGE4OWMwYjk0OGM4YzFhMzA3MThlMDI0NTI3MGIyZTZiZmE5NTEwZWQxJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.MSRUPuOMfKISNrCosYcgkDxb03FwoyNuwmnYD-8b9X8)
![image](https://private-user-images.githubusercontent.com/123814359/293143070-f1d9bb9e-1b66-41f0-8186-ae9aa7cc8e30.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjIyNTUxNjMsIm5iZiI6MTcyMjI1NDg2MywicGF0aCI6Ii8xMjM4MTQzNTkvMjkzMTQzMDcwLWYxZDliYjllLTFiNjYtNDFmMC04MTg2LWFlOWFhN2NjOGUzMC5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNzI5JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDcyOVQxMjA3NDNaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT0zZGRkYTkyZWYxZTcyYzcyYmE5OWFkYjRjMjcwOTZmMGUyOTA5ZTc4ODY2Nzg1NGFhZjQxOTQ2MzU5NTMzYzUzJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.aGvowTuXzB2zftBWvqmEGLl4J5A2cfCLhP8Tf5a9Mgw)
Could you please assist me in identifying any potential issues or misconfigurations that might be preventing the worker from scaling properly?
Thank you for your support.
Best regards,
Karso
Sorry for the late response! I would suggest setting KCS_LOG_LEVEL
to debug
or trace
to see what the scaler is doing.