ixti/sidekiq-throttled

Unthrottled keys get stuck behind big queues

spazer5 opened this issue · 8 comments

Environment:

gem 'sidekiq', '6.1.2'
gem 'sidekiq-throttled', '0.13.0'

How to reproduce:

Set sidekiq concurrency to 25

class MockWorker
  include Sidekiq::Worker
  include Sidekiq::Throttled::Worker

  MY_OBSERVER = lambda do |strategy, *args|
    puts "@@@@ THROTTLING #{strategy} #{args}"
  end

  sidekiq_throttle({
    :observer    => MY_OBSERVER,

    :concurrency => {
      :limit => 1,
      :key_suffix => Proc.new{|acc_id| acc_id }
    }
  })

  def perform(account_id)
    puts "===> #{Time.now.to_s} Starting job (account: #{account_id})"

    sleep 10

    puts "===> #{Time.now.to_s} Finished! (account: #{account_id})"
  end
end

and then try to enqueue 100 jobs with the same account_id, and another single job with a different account_id:

100.times {|n| MockWorker.perform_async(1)  }
MockWorker.perform_async(2)

Behavior

You will notice that account 2's job stays enqueued for a while, upto 30-60 seconds even though its key shouldn't be throttled at all. basically when the job queue gets to 50-60 you start noticing latency on jobs that shouldn't throttle.

I could imagine it to be a fetch size issue where sidekiq or the throttler doesn't get to the job of account 2 until later and never has a chance to decide whether it should throttle or not.

Keep in mind i have even tried to reduce the poll interval to as little as 1 second, with no luck:

Sidekiq.configure_server do |config|
  config.average_scheduled_poll_interval = 1
end

Anyone is experiencing this?

Bump

Bump

Bump!

bump

@spazer5 Here's an old explanation of how throttling current works in this library from one of the authors

tl;td:

It gets pushed back to the end of the queue it was retrieved from. And that queue is removed from the queues to poll for 2 seconds.

#52 (comment)

It seems that this behavior might be affecting our throughput on some shared queues, so we are looking to better understand this behavior too. It seems that this PR #80 in the 0.12.0 release allows consumers to define for how long the queue should pause, so I assume one could set this parameter to 0 to "disable" that behavior and potentially speed up processing. Haven't tested that myself to validate it nor looked closer at the Redis implications of it.

@ixti Has something changed from the scope above?

Just a warning for anyone who stumbles on this in the future: if you set the cooldown to 0 and all of the jobs in your queues are throttled, Sidekiq will pop throttled jobs as fast as possible and re-enqueue them (since they are throttled) which can cause high cpu load on the Redis/Sidekiq servers. I'm assuming the cooldown was added to prevent thrashing like this.

ixti commented

Yes, cooldown was added to avoid thrashing redis. But I'm thinking on a better dynamic (based on statistics of) skips, will be part of 1.0.0 release

ixti commented

I have completely removed cooldown part in v1.0.0.alpha; Will introduce a simple way to have it back to those who will actually need it.