salsify/delayed_job_groups_plugin

DeserializationError in on_cancellation_job causes delayed_job worker to exit

richhollis opened this issue · 2 comments

Hello!

I've come across an issue where models that on_cancellation_job depends upon no longer exist, a DeserializationError is raised but in delayed_jobs_groups_plugin's cancel method this isn't handled, so the worker terminates and cannot then be restarted until the job group is deleted.

The same issue exists for on_completion_job.

To solve this issue, I would propose something like this:

module Delayed
  module JobGroups
    class JobGroup < ActiveRecord::Base
      def cancel
        begin
          Delayed::Job.enqueue(on_cancellation_job, on_cancellation_job_options || {}) if on_cancellation_job
        rescue Delayed::DeserializationError => error
          say "Rescued DeserializationError for Delayed::JobGroups::JobGroup on_cancellation_job, ignoring so that job can be destroyed"
          say error
          say (error.backtrace * "\n")
        end
        destroy
      end

      private

      def complete
        begin
          Delayed::Job.enqueue(on_completion_job, on_completion_job_options || {}) if on_completion_job
        rescue Delayed::DeserializationError => error
          say "Rescued DeserializationError for Delayed::JobGroups::JobGroup on_completion_job, ignoring so that job can be destroyed"
          say error
          say (error.backtrace * "\n")
        end
        destroy
      end

      def say(msg)
        puts msg
        Rails.logger.error msg
      end
    end
  end
end

What are your thoughts on this? Can you see any issue in taking this approach? If the underlying models have been deleted then the completion routines are never going to fire. DJ won't retry a job with DeserializationError but gracefully fails.

Happy to submit a PR if you are accepting PRs.

Thanks
Rich

Thanks for reaching out Rich. Does the following snippet represent the issue you're running into? I just want to make sure I fully understand the problem.

# Example class for an on_cancellation_job
module Delayed::JobGroups::JobGroupTestHelper
    class OnCancellationJob

  end
end

# Create a job group with an on_cancellation_job
job_group = Delayed::JobGroups::JobGroup.create!(on_cancellation_job: Delayed::JobGroups::JobGroupTestHelper::OnCancellationJob.new,
                                                         on_cancellation_job_options: {})

# Remove the class that has been serialized into an on_cancellation_job
Delayed::JobGroups::JobGroupTestHelper.module_eval do
    remove_const 'OnCancellationJob'
end

# Trying to cancel the job group results in a failed deserialization
job_group.cancel

Hi Will. Thanks for coming back. Yes - that looks like it captures the essence of the problem. I had the problem with AR records that had been deleted but if the code behaves in the same way then this captures it.

I've been using this approach in production code for a while now and seems to be working OK.

Let me know your thoughts.

Cheers