chaps-io/gush

Shortcircuit workflows

linkyndy opened this issue Β· 16 comments

Let's say we have the following, simple, workflow:

A -> B -> C

And Job A checks for a value, which if false, the whole workflow should be stopped immediately. Is this possible at the moment with Gush?

As an example, let's say we process a new user signup with the above workflow. Job A retrieves the newest user from the database. If the user has been already processed, we'd like to call the workflow complete; the subsequent steps are not necessary anymore, since the user has been already processed.

Hey @linkyndy! Currently it's not possible but there was a suggestion about branching workflows in run time depending on their result, but I haven't had much time tinker with a solution for it.

Right now you could spawn a new Workflow inside A if conditions are met, but it's not a nice solution IMO

Thanks for your fast answer! Curious to see how this will develop.

EDIT: I just re-read OP's shortcircuit scneario, my original comment is irrelevant and doesn't help at all πŸ˜„

As an example, let's say we process a new user signup with the above workflow. Job A retrieves the newest user from the database. If the user has been already processed, we'd like to call the workflow complete; the subsequent steps are not necessary anymore, since the user has been already processed.

Hi @linkyndy I've just reimplemented something along those lines, this is how I went about it:

# example_workflow.rb
class ExampleWorkflow < Gush::Workflow

  def configure(user_id)
    run(FirstJob, params: { user_id: user_id })
    run(SecondJob, {
      params: { user_id: user_id },
      after: FirstJob,
    })
  end

end

# first_job.rb
class FirstJob < Gush::Job

  def perform
    user = User.find params[:user_id]

    if user.processed?
      self.fail!
      return
    end

    # continue code ...
  end

end

However this marks the flow.status as :failed. But it doesn't run the SecondJob once I call self.fail (I found this after snooping around the source).

@pokonski is this how .fail! could be used? I could put this in the README if you like, would be glad to contribute a PR! πŸ˜„

I don't think this can be used, because fail! will stop the whole workflow πŸ€”

Yep! That's what I was aiming to achieve πŸ˜„the subsequent actions would be skipped, I think based on @linkyndy said:

If the user has been already processed, we'd like to call the workflow complete; the subsequent steps are not necessary anymore, since the user has been already processed.

I figured, that's what he/she is also aiming for.

there was a suggestion about branching workflows in run time depending on their result, but I haven't had much time tinker with a solution for it.

What would this look like? Could I take a stab at this, just pseudo-code and how the usage would look like? I'll think up of something later today, some sort of proposal for this.

One of the good examples had an idea to allow providing two paths, like so:

    run SomeJob,
        before_success: MainJob,
        before_failure: AlternativeJob,

Though the naming is rather unfortunate because it suggest the MainJob will ran before SomeJob is succeesful, so needs better naming πŸ’ƒ

@ace-subido Marking the whole workflow as failed may be too harsh; @pokonski Indeed, it's not really clear from the DSL 😊

But coming back to the problem from the original issue by @linkyndy, one way would be to introduce a skipped state in jobs. Similar to how Gitlab CI works. I would be fine with accepting such resolution

@pokonski so something like this:

class SkippedJob < Gush::Job
  def perform
    # marks the job as 'skipped', this would also do a 'return'
    self.skip! 
  end
end

We could also do something like skip_remaining! which skips all of the other jobs too, marking the entire workflow as :skipped.

class ExampleWorkflow < Gush::Workflow
  def configure(user_id)
    run SkipRemainingJob
    run SecondJob, after: SkipRemainingJob
  end
end

class SkipRemainingJob < Gush::Job
  def perform
    # marks the job as 'skipped' and all other jobs after, marks the workflow itself as 'skipped' too
    self.skip_remaining! 
  end
end

@pokonski What do you think?

Yeah this sounds good! One clarification here:

which skips all of the other jobs too

I assume you mean skip all the jobs that were supposed to run after the job skip_remaining! is called from, right?

Yeah, probably cascade throughout job.outgoing in Gush::Worker#enqueue_outgoing_jobs and all it's descendants.

I don't really see any value in this job.skip!. If I want to "skip" the current job and go to the next one, I can simply return.

My initial question was related to cancelling the entire workflow from that moment on, and dealing somehow with the remaining jobs.

If that's the case, I'll add more to the PR. Something like: skip_remaining!, which would tag everything from that point as skipped

This particular feature looks very useful for an application we are developing. I would rather be very explicit and name the methods as skip_workflow! for skipping the whole workflow and skip_descendants! for skipping the jobs which should run after the current job.

I have been using this pattern in my own workflows for a bit now and I was thinking it would be good to go over this discussion again and clear up the goals.

For me, having a state change that occurs when you want to stop executing a job and move on in the workflow is useful on its own, ie the 'skip' call on a job marks that job 'skipped'.
That state represents a job halting but not failing. Since it could happen anywhere in a job the state does end up being a bit ambiguous in meaning, depending on your use case.

I use it as an indicator that the job halted without completing intentionally due to some conditional checks on the state of a set of records, ie 'Invoices'. So, any record (ie Invoice) whose processing job is 'skipped' can then know when it is reprocessed anywhere other than the normal workflow path that the job did or did not finish intentionally.

This lets me avoid needing to track such a state within an Invoice record, isolating it to the job. This seems to lead to good data management and semantics in my opinion. The job state is tracked enough that there is no need to track additional state in a record processed within that job.

I think this is meaningfully distinct from 'return' in a job as returning early does not maintain any state information that could be used for another process to look at.
Really there could be cases where you skip a job in one conditional branch and return early and continue the workflow without any state change in another conditional branch.
I haven't used such a pattern but I think it is a reasonable idea.

I think for this use case a 'skip_remaining' method would not accomplish my needs but I do think there is still room to do both.
Skipping an individual job without halting the whole workflow, and also being able to make the entirety of the remaining workflow jobs as skipped. I would be glad to push forward some ideas on that also but this is already a long post.

If that is too long for anyone:
TL;DR There is room for both skipping an individual job and also skipping the rest of a workflow.

Look forward to having some fresh takes on these ideas!