uniquejobs:digests sorted set seems to grow forever
JeremiahChurch opened this issue · 3 comments
Describe the bug
Our prod uniquejobs:digests sorted set in redis grew to 3GB in about 3 weeks. (~5mil jobs/day, less than a 1000 total jobs in queues and dead job queue during screenshot time)
our lock TTLs are at max 6 hours - the vast majority are 5 minutes.
Expected behavior
my understanding is that we should clean the digests as conditions occur (mostly when our jobs exit successfully) or at worst when the reaper runs.
Current behavior
the uniquejobs:digests sorted set grows until we run out of redis ram
Worker class
47 different jobs that have a lock on them. the only locks we use are: until_and_while_executing
, until_executing
, & until_executed
, 95% of them are until_and_while_executing
# our entire sidekiq config
require 'sidekiq'
require 'sidekiq-unique-jobs'
Sidekiq.default_job_options = { 'backtrace' => true, 'retry' => 15 }
Sidekiq.strict_args!
SidekiqUniqueJobs.configure do |config|
config.lock_info = true
config.lock_prefix = 'prod_uniq' # new value
config.lock_ttl = 5.minutes # default for anything - any longer jobs should have one specified
config.enabled = !Rails.env.test?
config.logger_enabled = false
config.debug_lua = false
config.max_history = 10_000
config.reaper = :ruby # :ruby, :lua or :none/nil
config.reaper_count = 50 # Stop reaping after this many keys
config.reaper_interval = 305 # Reap orphans every 5 minutes
config.reaper_timeout = 30
end
Sidekiq.default_configuration.redis = { url: ENV['REDIS_URL'] || 'redis://localhost:6379/0', network_timeout: 3 } # relax redis timeouts a bit default is 1
Sidekiq.configure_server do |config|
if config.queues == ['default']
concurrency = (ENV['SIDEKIQ_CONCURRENCY'] || (Rails.env.development? ? 7 : 23)).to_i
config.queues = %w[af,10 o,8 ws,5 r,5 s,4 t,4 sl,3 searchkick,2 sd,1 c,1]
config.concurrency = concurrency
config.capsule('limited') do |cap|
cap.concurrency = Rails.env.production? ? (concurrency / 3) : 1
cap.queues = %w[af,10 wms,4 t,4 searchkick,3 sd,1]
end
config.capsule('single') do |cap|
cap.concurrency = 1
cap.queues = %w[counters,1]
end
end
config.client_middleware do |chain|
chain.add SidekiqUniqueJobs::Middleware::Client
end
config.server_middleware do |chain|
chain.add SidekiqUniqueJobs::Middleware::Server
end
config.logger.level = ENV.fetch('SIDEKIQ_LOG_LEVEL', Logger::INFO) if Rails.env.production?
SidekiqUniqueJobs::Server.configure(config)
end
Sidekiq.configure_client do |config|
config.client_middleware do |chain|
chain.add SidekiqUniqueJobs::Middleware::Client
end
end
Additional context
We're generally running the top of main from a version perspective. currently 8.0.6. sidekiq 7.1.6 currently, rails 7.0.8.
This is the second or 3rd time that we've seen the issue cropped up, not sure if it's been introduced recently or if it's always been there and we just haven't noticed until recently.
Failures or jobs exiting because of an exception or other 'non normal' exit are less than 0.1% of all jobs run.
I've been through the reaper issues, found some similar issues but seemingly nothing exact.
As always, huge love for the gem <3
looking at details on #637 as it seems very similar
@JeremiahChurch I believe this have improved with cddcc08 and those changes should be on the main branch.
I have also tweaked the reaper a bit.