DeserializationError in on_cancellation_job causes delayed_job worker to exit
richhollis opened this issue · 2 comments
Hello!
I've come across an issue where models that on_cancellation_job depends upon no longer exist, a DeserializationError is raised but in delayed_jobs_groups_plugin's cancel method this isn't handled, so the worker terminates and cannot then be restarted until the job group is deleted.
The same issue exists for on_completion_job.
To solve this issue, I would propose something like this:
module Delayed
module JobGroups
class JobGroup < ActiveRecord::Base
def cancel
begin
Delayed::Job.enqueue(on_cancellation_job, on_cancellation_job_options || {}) if on_cancellation_job
rescue Delayed::DeserializationError => error
say "Rescued DeserializationError for Delayed::JobGroups::JobGroup on_cancellation_job, ignoring so that job can be destroyed"
say error
say (error.backtrace * "\n")
end
destroy
end
private
def complete
begin
Delayed::Job.enqueue(on_completion_job, on_completion_job_options || {}) if on_completion_job
rescue Delayed::DeserializationError => error
say "Rescued DeserializationError for Delayed::JobGroups::JobGroup on_completion_job, ignoring so that job can be destroyed"
say error
say (error.backtrace * "\n")
end
destroy
end
def say(msg)
puts msg
Rails.logger.error msg
end
end
end
end
What are your thoughts on this? Can you see any issue in taking this approach? If the underlying models have been deleted then the completion routines are never going to fire. DJ won't retry a job with DeserializationError but gracefully fails.
Happy to submit a PR if you are accepting PRs.
Thanks
Rich
Thanks for reaching out Rich. Does the following snippet represent the issue you're running into? I just want to make sure I fully understand the problem.
# Example class for an on_cancellation_job
module Delayed::JobGroups::JobGroupTestHelper
class OnCancellationJob
end
end
# Create a job group with an on_cancellation_job
job_group = Delayed::JobGroups::JobGroup.create!(on_cancellation_job: Delayed::JobGroups::JobGroupTestHelper::OnCancellationJob.new,
on_cancellation_job_options: {})
# Remove the class that has been serialized into an on_cancellation_job
Delayed::JobGroups::JobGroupTestHelper.module_eval do
remove_const 'OnCancellationJob'
end
# Trying to cancel the job group results in a failed deserialization
job_group.cancel
Hi Will. Thanks for coming back. Yes - that looks like it captures the essence of the problem. I had the problem with AR records that had been deleted but if the code behaves in the same way then this captures it.
I've been using this approach in production code for a while now and seems to be working OK.
Let me know your thoughts.
Cheers