practo/k8s-worker-pod-autoscaler

Cannot get qMsgs if the WPA deleted and re-created

KangBK0120 opened this issue · 3 comments

Hi, I faced an issue with the controller in the master branch.

If I create a dummy deployment and a dummy WPA, it worked without any problem.
The deployment does not do any job from SQS. It does not receive or process the messages in the queue. And the dummy WPA simply auto-scales it.

However, the controller failed to get qMsgs if I deleted the WPA and re-create it with the same YAML.

image

Here are the YAML files I used

apiVersion: apps/v1
kind: Deployment
metadata:
  name: dummy-deployment
spec:
  selector:
    matchLabels:
      app: dummy-deployment
  replicas: 1
  template:
    metadata:
      labels:
        app: dummy-deployment
    spec:
      containers:
      - name: nginx
        image: nginx:1.7.9
        ports:
        - containerPort: 80
apiVersion: k8s.practo.dev/v1
kind: WorkerPodAutoScaler
metadata:
  name: dummy-wpa
spec:
  minReplicas: 1
  maxReplicas: 100
  deploymentName: dummy-deployment
  queueURI: https://sqs.{region}.amazonaws.com/{id}/dummy-queue
  targetMessagesPerWorker: 1
  maxDisruption: "100%"

I did not change anything in the workerpodautoscaler deployment.

I found that the issue comes from poller.go

When a new WPA resource is created, the thread is successfully created and works fine.
However, if a user deletes the WPA, the thread is deleted but the sync function in poller still checks and holds its status.

And if a new WPA is created with the same key, the thread is not created and thus qMsgs cannot be fetched.

Therefore I changed the sync function as follows and this fixes my issue.

func (p *Poller) Sync(stopCh <-chan struct{}) {
	for {
		select {
		case listResultCh := <-p.listThreadCh:
			listResultCh <- DeepCopyThread(p.threads)
		case threadStatus := <-p.updateThreadCh:
			for key, status := range threadStatus {
				if status == false {
					delete(p.threads, key)
				} else {
					p.threads[key] = status
				}
			}
		case <-stopCh:
			klog.V(1).Info("Stopping sync thread of poller gracefully.")
			return
		}
	}
}

Before:

image
image

After:

image

Sorry for the bad output format. I'm new to Go

Thanks @KangBK0120 for reporting, debugging and fixing the issue.
I have released a patch release v1.4.1 for it as it could be critical for others if they recreate with the same WPA name.

practodev/workerpodautoscaler:v1.4.1