klippa-app/keda-celery-scaler

Inquiry Regarding Celery Worker Scaling Issue

Opened this issue · 1 comments

Hello,

I'm reaching out to seek guidance regarding an issue with the scaling of my worker. It appears that the worker hasn't scaled as expected, and I'm uncertain whether there might be a misconfiguration causing this. Currently, my setup is on EKS Kubernetes version 1.21, using Keda version 2.8.2. Below is the YAML configuration I'm using:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: keda-celery
  labels:
    component: keda-celery
spec:
  replicas: 1
  selector:
    matchLabels:
      component: keda-celery
  template:
    metadata:
      labels:
        component: keda-celery
    spec:
      containers:
        - name: keda-celery
          image: "jerbob92/keda-celery-scaler:latest"
          env:
            - name: KCS_LOG_LEVEL
              value: "info"
            - name: KCS_WORKER_STALE_TIME
              value: "10"
            - name: KCS_WORKER_CLEANUP_INTERVAL
              value: "5"
            - name: KCS_WORKER_QUEUE_MAP
              value: "internal-queue-worker:internal_queue"
            - name: KCS_REDIS_TYPE
              value: "standalone"
            - name: KCS_REDIS_SERVER
              value: "santika-api-redis-svc.default.svc.cluster.local:6379"
          ports:
            - containerPort: 6000
              name: keda-celery
          resources:
            limits:
              cpu: "500m"
              memory: "512Mi"
            requests:
              cpu: "100m"
              memory: "256Mi"

---

apiVersion: v1
kind: Service
metadata:
  name: keda-celery
spec:
  ports:
    - port: 6000
      protocol: TCP
      name: keda-celery
  selector:
    component: keda-celery

---

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: internal-queue-worker
spec:
  scaleTargetRef:
    name: internal-queue-worker
  pollingInterval: 5
  cooldownPeriod: 300
  minReplicaCount: 1
  maxReplicaCount: 15

  triggers:
    - type: external
      metadata:
        scalerAddress: "keda-celery.default.svc.cluster.local:6000"
        queue: "internal_queue"
        scaleLoadValue: "50" # Scale at a load of 70%.
      metricType: Value

---

And here is my celery worker yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: internal-queue-worker
    restart-every: 30m
  name: internal-queue-worker
  namespace: default
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: internal-queue-worker
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      labels:
        app: internal-queue-worker
        name: santika-celery
        version: v0.1.982
    spec:
      containers:
      - command:
        - celery
        - -A
        - santika
        - worker
        - -l
        - info
        - -Q
        - internal_queue
        - -c
        - "5"
        envFrom:
        - configMapRef:
            name: env-configmap
        - secretRef:
            name: santika-api
        image: XXXXX
        imagePullPolicy: IfNotPresent
        name: santika-celery
        resources:
          limits:
            cpu: 500m
            memory: 1148Mi
          requests:
            cpu: 250m
            memory: 350Mi
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /santika/static
          name: static-file
        - mountPath: /santika/media
          name: media-file
      dnsPolicy: ClusterFirst
      imagePullSecrets:
      - name: gcr-json-key
      nodeSelector:
        lifecycle: spot
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30
      volumes:
      - name: media-file
        persistentVolumeClaim:
          claimName: santikamedia-volumeclaim
      - name: static-file
        persistentVolumeClaim:
          claimName: santikastatic-volumeclaim

I've ensured that both the pod and Celery are running smoothly. As evidence, I've attached screenshots from my local Celery flower and Kubernetes pod:

Screenshot 2023-12-28 122557 image

Could you please assist me in identifying any potential issues or misconfigurations that might be preventing the worker from scaling properly?

Thank you for your support.

Best regards,
Karso

Sorry for the late response! I would suggest setting KCS_LOG_LEVEL to debug or trace to see what the scaler is doing.