Can one of `targetMessagesPerWorker` and `secondsToProcessOneJob` be deduced from the other
sujanadiga opened this issue · 3 comments
As per the current implementation targetMessagesPerWorker
is a mandatory option and secondsToProcessOneJob
is an optional one with default=0
As per my understanding, secondsToProcessOneJob
is used to calculate minimum number of workers and targetMessagesPerWorker
is used to calculate the usage ratio and hence to determine desired number of workers.
Since both the values can be tuned separately, if these values are not in sync, it can result in undesired scaling behaviour.
Taking the example from one of the test cases,
k8s-worker-pod-autoscaler/pkg/controller/controller_test.go
Lines 43 to 74 in 59d0d48
As per this snapshot, a worker would take 10s to process a single job, and there were approximately 2136.6 messages sent in the last 1 minute. This would make the minimum number of workers needed to 21366, but the number of desired workers will be calculated as 1 since once worker can handle 2500 messages in a minute(targetMessagesPerWorker is 2500, but there is only one message in the queue)
Outcome:
Min: 21366
Max: 20
Desired calculated: 1
Desired: 20(capped to max workers)
Yes, allowing both targetMessagesPerWorker
and secondsToProcessOneJob
to be configured separately might help in cases where we want to clear backlog as fast as possible(@justjkk 's comment), however it is true only for 50% of cases where
targetMessagesPerWorker < (60 / secondsToProcessOneJob)
My question is, can one of targetMessagesPerWorker
and secondsToProcessOneJob
be deduced from the other using the formula
secondsToProcessOneJob = 60 / targetMessagesPerWorker
to avoid scaling issues due to misconfiguration?
-
targetMessagesPerWorker
is useful for long running workers that take seconds or even minutes to process a job and jobs per minute is almost 0. In this case,secondsToProcessOneJob
is ineffective because RPM(averaged over 10 minutes) is almost 0. -
secondsToProcessOneJob
is useful for fast workers that consume jobs so fast that queued jobs is always at 0 and jobs per minute is high. In this case,targetMessagesPerWorker
is ineffective because queued jobs is 0.
#105 (currently WIP) will update the documentation of these parameters and also provide example scenarios.
Regarding using a single configurable value and then using it to calculate the other with the below formula:
secondsToProcessOneJob = 60 / targetMessagesPerWorker
The above conversion assumes 1 minute as the desired time to process the backlog. Since targetMessagesPerWorker
is not a per minute value, the above formula cannot be used. targetMessagesPerWorker
is just the target ratio that the WPA controller tries to maintain between the queued jobs(available + in-process jobs) and the current workers.
So, we still have two out of three variables that needs to be specified and the third can be calculated from that. The actual formula can be:
secondsToProcessOneJob = desiredSecondsToClearQueuedMessages / targetMessagesPerWorker
@sujanadiga @justjkk Related 4b76774