alexhsamuel/apsis

Invalidate and rerun run and all downstream runs

Opened this issue · 2 comments

Feature request

As a simple example:

  • A, B, C & D are job runs.
  • Graph implied by jobrun conditions of A -> B -> C -> D (X -> Y meaning Y depends on X)
  • A and then B run, C has failed
  • You would like to reset the conditions such neither that A & B runs are marked as error, and reruns of A, B and C are scheduled. Nothing needs doing to D which is waiting on C to be marked as success before it can start.
  • Ideally without clicking through manually and doing this.

You might recognise this feature from other scheduling systems.

I realise it doesn't translate clearly to the model of dependencies where it is only A completing triggers B to be scheduled as a run (I realise apsis supports both this and the condition model, but having this as a feature for condition-based dependencies would be hugely useful.

Let me make sure I understand.

You would like to reset the conditions such neither that A & B runs are marked as error, and reruns of A, B and C are scheduled.

You would like to schedule reruns of A, B, and C in sequence (i.e. with the dependency relation, as if they never had run), without marking A or B as failed (or error)?

To be clear, you would have to mark A and B as failed, to avoid B and C respectively running immediately when you rescheduled them, rather than waiting for the rerun of the dependency to complete.

I actually think the correct thing to do here is to mark A and B as failed. Presumably you are rerunning them because you fixed either the code or some state in your system. While Apsis may think they succeeded, they in fact did not, and you should fix Apsis's view of the world.

So what you want is:

  1. Gather the set of existing runs that depend transitively on your run of A.
  2. For each, if it's starting or running, kill it.
  3. For each, if it's success, mark it as failed.
  4. Schedule a rerun of each, unless there already is one scheduled or waiting.

This could be automated in a fairly short Python script that uses Apsis's API. I'll draft one for you, if you like.

As a side note:

reset the conditions

Apsis assumes a condition:

  • is fairly cheap to evaluate
  • has the state model False -> True (with no backward transition)

As such, it does not really store the state of a condition, except implicitly in its internal control flow. There isn't really anything to reset.