practo/k8s-worker-pod-autoscaler

How to achieve near realtime scheduling of pods?

Closed this issue · 3 comments

Question

How can I configure WPA to schedule new pods with the lowest latency possible?

Context

  • I'm using overprovisioning as recommend by #127, so there's always an available node to schedule a pod on.
  • I'm only using one queue, so I'm not worried about being rate limited by aws because of frequent requests to sqs.
  • I've configured WPA to run with the following options. Specifically, I've reduced sqs-short-poll-interval to 1 so that WPA can register updates as fast as possible.
 command:  [
            "/workerpodautoscaler",
            "run",
            "--resync-period=20",
            "--wpa-threads=10",
            "--sqs-short-poll-interval=1",
            "--sqs-long-poll-interval=20",
            "--k8s-api-qps=5.0",
            "--k8s-api-burst=10",
            "-v=2",
]

Can you offer some more color on what theses values are and how WPA uses them?

  • resync-period -- sync period for the worker pod autoscaler
    • What is the WPA syncing with? Is this k8s or SQS/cloudwatch? It's k8s. From #32 (comment)
    • How does this affect the time to schedule a pod?
  • k8s-api-qps -- qps indicates the maximum QPS to the k8s api from the clients(wpa)
    • Does QPS mean queries per second?
    • What is the relationship between setting a max qps and a max burst? How are they different?
  • k8s-api-burst -- maximum burst for throttle between requests from clients(wpa) to k8s api.
    • Does this limit the number of requests made by WPA to k8s to update the replica counts on deployments?
    • What is the time frame for this throttle?
  • sqs-short-poll-interval -- the duration (in seconds) after which the next sqs api call is made to fetch the queue length
    • Am I correct in my understanding that WPA will make a request to sqs at the short poll interval, then wait for the long poll interval before making another request?

More info about resync-period and sqs-short-poll-interval, in this discussion: #32 (comment)

The complete SQS WPA metric logic is in this function where both short and long poll intervals are used.

This article explains the short and long polling of AWS.

  • long-poll-interval is used to reduce the no of calls to AWS for receiving messages.
    if queueSpec.workers == 0 && queueSpec.messages == 0 && queueSpec.messagesSentPerMinute == 0 { When there are no workers running, number of messages in the queue is 0 and the RPM on the queue is zero, we try to long poll as there is no need to keep calling AWS API and the same call can wait long poll seconds before returning. The maximum long polling wait time is 20 seconds.
  • sqs-short-poll-interval: Short poll we use as a sleep to prevent AWS API calls to prevent repeated fetch after a recent fetch. So every short pull seconds it would make calls to AWS for fresh metrics. You should keep this value based on how frequently AWS is refreshing those metrics.
  • k8s-api-qps and k8s-api-burst defaults works fine in most of the cases and regardless of its value it will never slow down scaling, but can slow down finishing of one control loop. This does not impact the problem you are solving.

Hope this helps!

@alok87, thanks for your explanation. This is really helpful!