chaps-io/gush

Jobs that eventually succeed will not queue downstream jobs

jdreic opened this issue ยท 6 comments

When a job throws an exception, Gush marks it as failed and reraises the error. Job frameworks like Sidekiq retry failed jobs by default. When a job has failed but later succeeds, since Gush marked the job as failed from the first run, downstream jobs that depend on it won't run.

It would be great if there was an option to unset the failed status if a job re-runs, so that if it succeeds, the downstream workflow can continue.

Hey @jdreic, thanks for the report. This sounds like a bug!

Hi,

The problem occurs in the job parameters.
If job first launch failed, it set @failed_at = time of fail.
But when job is replayed, @failed_at is not resetted to nil.

If you take look at https://github.com/chaps-io/gush/blob/master/lib/gush/worker.rb#L73, when children jobs are enqueued, it checks if parent has succeeded.
Since the parent job still had the @failed_at non-nil, the children are not enqueued.

This is my fix :

# Override Gush::Job start!
# To ensure failed_at is reset when job is relaunched by Sidekiq
class BaseJob < Gush::Job
  def start!
    super
    @failed_at = nil
  end
end

Hi,

I prefer using refinements ๐Ÿ˜„

# Temporary fix: https://github.com/chaps-io/gush/issues/61
# Overrides Gush::Job start! method.
# Resets failed_at variable when Sidekiq reloads a job.
module GushJobFix
  refine Gush::Job do
    def start!
      super
      @failed_at = nil
    end
  end
end

class MyJob < Gush::Job
  using GushJobFix

  def perform
    ...
  end
end

Hey guys! Thank you for finding the culprit! Will release a fix ASAP!

@mickael-palma-argus @theo-delaune-argus @jdreic this is now released as 2.0.1, thanks again for the report and help with identifying the issue โค๏ธ

/cc @devilankur18 @hqm42 @vadshalamov

You rock ๐Ÿ‘
Great specs BTW ๐Ÿ˜ƒ