until_and_while_executing with separate server and client lock_args broken since 8.0.8
Roguelazer opened this issue · 7 comments
Describe the bug
After upgrading from 8.0.7 to 8.0.10, jobs with until_and_while_executing
that pass a custom lock_args_method
which behaves differently on the server and the client (as shown in the README) no longer work. Based on bisection, this appears to have been broken by 63e9431 (the parent commit, 7d5e40a, still works).
Expected behavior
Given a worker as shown below, run the following client code:
1.upto(2)
UniqueBugWorker.perform_async(id)
UniqueBugWorker.perform_async(id)
UniqueBugWorker.perform_async(id)
UniqueBugWorker.perform_async(id, "foo")
UniqueBugWorker.perform_async(id, "foo", "bar")
end
We expect to see exactly six invocations.
Current behavior
Since 63e9431, we end up seeing at most two invocations (and sometimes only 1, which is even more worrying). It appears that the lock digest is being reused between the until and while executing phases.
For some reason, the conflicted jobs also aren't getting rescheduled; they just get dropped on the floor.
worse yet, we end up creating and leaking uniquejobs: locks with the correct digest, so after downgrading to 8.0.7, the jobs are locked out until the lock TTL expires.
Worker class
class UniqueBugWorker
include Sidekiq::Worker
sidekiq_options queue: "repro",
lock: :until_and_while_executing,
on_conflict: {server: :reschedule, client: :log},
lock_args_method: ->(args) do
if Sidekiq.server?
[args.first]
else
args
end
end
def perform(id, foo = nil, bar = nil)
Rails.logger.info { "UniqueBugWorker: performing #{id} with #{foo} #{bar}" }
sleep 1
end
end
Additional context
Add any other context about the problem here.
We seem to have been running into same issues since upgrade but with until_executed
(possibly with other locks as well but I only have confirmation about issues with until_executed
)
@reneklacan do you also use custom lock args?
@mhenrixon no
we are getting a similar issue sidekiq_options lock: :until_executed, on_conflict: :reschedule error => SystemStackError: stack level too deep from <internal:kernel>:185:in `loop
As mentioned above; This appears to be an issue related to until_executing
and reschedule
. This also affects until_and_while_executing
as it contains the until. I think this is actually easier to produce with until-and-while. You can produce the error with the following;
class BrokenJob
include Sidekiq::Job
sidekiq_options lock: :until_and_while_executing, on_conflict: :reschedule
def perform = sleep(10)
end
BrokenJob.perform_async
#=> "job_id"
BrokenJob.perform_async
# SystemStackError: stack level too deep
For :until_executing
its easier to reproduce if you just don't run the sidekiq worker process at all (or while the queue is backed up.)
From what I can tell; If the job is enqueued it fails to get the lock (until). While rescheduling, it checks this lock again and... since its locked, tries to reschedule... What I think should happen here is if the job is being rescheduled (or.. scheduled in the future in general?), it should ignore checking the lock?
As an asside; Interestingly, sometimes I can get this to actually run two jobs at the same time, and some times the second invocation returns nil
and the 3rd will produce a Stack error. So the locking in general is a bit racey to begin with?
That is not great if the lock is racey as the whole point of the lock is to prevent race condition
To extend...
I tried updating this to remove the lock while OnConflict::Reschedule
is doing perform_in
but... it really doesn't matter. The gem prioritizes the classes options, over the jobs options. So even if you .set(lock: nil)
it purposefully ignore the lock setting, it still uses the lock setting from the class.
- https://github.com/mhenrixon/sidekiq-unique-jobs/blob/v8.0.10/lib/sidekiq_unique_jobs/middleware.rb#L36
- https://github.com/mhenrixon/sidekiq-unique-jobs/blob/main/lib/sidekiq_unique_jobs/options_with_fallback.rb#L23-L30
- https://github.com/mhenrixon/sidekiq-unique-jobs/blob/main/lib/sidekiq_unique_jobs/options_with_fallback.rb#L60-L62
item
is the instance of the job,options
is from the class itself. So you can never override the lock type.