scipp/sciline

order of producers are neglected

Closed this issue · 7 comments

Hi scliline team,

Playing around, I learned that the order of producers is not respected in the pipeline. Instead, the order of the producers in the pipeline is determined solely from the type hints.

In many applications, it can be desired to easily change the order of producers which accept and return the same data type (e.g scipp Dataset). With the current restriction, one needs to change the producer's code (type hints) instead of only changing the order in the list of producers.

Best

Daniel

Hi Daniel,

there should only be a single producer for each type. If you have multiple functions that return, e.g., a Dataset, you indeed need to use distinct types for each. While this can be annoying, we chose this design to avoid ambiguities.

Can you provide an example where you want to use the order of providers to select which one gets used?

H @jl-wynen ,

Many thanks for your prompt reply!

In the meantime, I also found your design decision clearly stated in the docs.

I fully agree that from a design point of view and also for an established analysis routine, your concept is well-chosen and also safe.

In our case, we e.g. often handle images from scientific CCD. In that case, after reading the data, we have many functions/produces all working on the same data type before reducing/transforming the images. Sticking to the examples of your docs, we are talking about e.g. cleaning, subtracting, scaling, ROI selection, and binning of the data.

During beamtime, one might play a bit with the order of these steps, e.g. (clean, bin) <-> (bin, clean), but also simply just drop one of the producers.

Maybe you have some concepts for such kinds of requirements. When initially thinking about designing our own sciline-like project, we considered using the order of producers in a list as the definition of the pipeline order.

Relying on the order is straightforward when you have a linear chain. But what about providers that take multiple inputs?

You also need to consider whether the providers are actually able to consume data that has been processed in different ways. A simple example is

(clean, bin) <-> (bin, clean)

Here, the clean function needs to be able to handle both event and binned data (if bin is a scipp-like binning). Or data binned in different ways.
I'm not saying that there are no cases where it is easily possible to insert or reorder steps, this is in general not the case.

thanks for this clarification. at the end I wanted to be sure about your design philosophy so we can discuss about is applicability for our needs.

diving deeper into your extensive documentation and examples I also found ess-notebookes saw that many workflows are still coded in notebooks and plain functions. So I guess this could be also our approach to work in notebooks as usual and when certain workflows have been established they get their more rigid sciline pipeline.

The ess-notebooks repo hasn't been touched in a while. But yes, 'powerusers' will probably write workflows without Sciline, too. At least for exploratory stuff. But we want production workflows to be written with Sciline so that we can establish provenance (#92) and allow less experienced users to configure parameters without having to know much about the actual implementation of the workflow.

I'd like to link to some real-life examples of Sciline use, which may be clearer than the slightly abstract documentation:

thanks a lot. these examples are already very close to our workflows