Ordering constraints for output processors
vivanishin opened this issue · 2 comments
Hi,
I have an agenda that looks like this, where trace_processor
and
metric_plotter
are both output processors that I have written. It is
important that trace_processor
is called prior to metric_plotter
, because
the former adds new metrics (by extracting the values from a trace file obtained
with mytrace_cmd
and calling job_output.add_metric()
in
process_job_ouput()
).
config:
augmentations: [~~, mytrace_cmd, trace_processor, metric_plotter]
workloads:
[...]
The order in which output processors are installed/run, however, is
non-deterministic because it is defined by set iteration order.
I can work around the problem by removing metric_plotter
from agenda and just
post-processing the data with wa process -p metric_plotter wa_output
instead. However, being able to specify the desired behavior in a single agenda
is preferable, as this is run in an automated environment.
Alternatively, I could refactor trace_processor
by integrating it into
mytrace_cmd
and thus making it part of an instrument, not an output
processor. This has a downside of having to move any plugin parameters that
trace_processor
has to the instrument. Suppose I also want to process traces
from theirtrace_cmd
; now that too should get the same set of additional
parameters...
Currently, it doesn't seem possible to chain output processors with WA in the
way I imagined. Perhaps I'm missing something, or maybe there is a better way
than my approach with two output processors? I appreciate any suggestions.
What I see as a proper solution to my problem is having a way to express plugin
dependencies. Would this be a reasonable feature request?
The order in which output processors (or instruments) are invoked does not depend on the order in which they are listed in the agenda. The assumption is that they are independent and chaining as you describe it is not supported. Adding something like that would in theory be possible, but given that our execution model is already pretty complex, far from trivial. Given the fairly limited utility that would provide, we cannot commit to implementing such a feature with any kind of priority.
In your specif case you have several options:
- Combine the two output processors into one. If processor B depends on the output from processor A, then it never makes sense to use it independently of it. If you want to be able to invoke the functionality of processor A without invoking the functionality of the processor B, you can always make it a configuration parameter of the combined processor.
- Add the metrics in an earlier stage. Have
trace_processor
add the metrics as part ofprocess_job_output
stage, andmetric_plotter
utilise them as part ofexport_job_output
. - We do have a limited ordering facility for callbacks in the form of priority decorators. You could decorate
trace_processor
's callback with@fast
(from wa import fast
) to ensure it gets invoked before callbacks with "normal" priority. (Note: this feature isn't really designed for enforcing ordering between specific augmentations; it's primary purpose is to increase precision for instrumentation by ensuring that timing-sensitive operations are not delayed by long ones; see https://workload-automation.readthedocs.io/en/latest/developer_information.html#prioritization).
Thanks a lot, switching to export_job_output
did it for me (seemed less of a hack than priority decorators).