Allow Mapping (Batch-Mode) over multiple data parameters.
Opened this issue · 2 comments
For tool parameters with type="data" multiple="true"
, Galaxy provides an interface for reducing a list collection over these parameters. Many more options should be available, the easiest and most essential of these is that you should be able to map a list
over these parameters (so run N jobs each with a single input from the supplied list)
Once that is done there is still more work to do, you should be able to map multiple lists over these parameters (so supply two lists of size (I no longer think this is a good idea - see note below.).N
and run N
jobs - each with the matching two datasets from the two supplied lists), you should be able to map over the outer lists of a nested collection and reduce the inner ones (so if you have a list of samples where each list element is a list of replicates and you have a concatenation tool - you should be able to concatenate the replicates (reduce the replicates) and build a list of merged samples (map the samples) from that tool).
Each of these modes of operation described above can be worked around by modifying the tool itself, but this is definitely a hack and the GUI should have a common set of language and UI for describing these operations.
The hacks to workaround these limitations include...
Allowing both mapping and reduction of simple lists can be accomplished by replacing the type="data" multiple="true"
with a conditional
that has that same parameter as one path and a simple (non-multiple) data parameter as the second path. Modifying that conditional so that second path isn't just a data parameter but a repeat
parameter with a minimum repeat number of 1
allows that second use case above of allowing to map multiple lists. Adding another case
with a type="data_collection" collection_type="list"
to the tool allows the mapping over the outer list, reducing the inner list operation described above. Wrapping that parameter in a repeat would allow you to do that with multiple list:list
s. The last operation would also work for list:list:list
if you wanted to map the outer two list depths but reduce the inner ones. If you wanted to instead reduce the inner two lists and map over the outer one you could add yet another conditional case with a list:list
input.
I've evolved on this issue, I actually don't think the tool form should supply advanced selection for reducing multiple layers of nesting - for instance reducing the inner two lists of list:list:list
and mapping over the outer list. The GUI is too complicated for that, tracking the backend is difficult, and it would complicate the APIs. We have a better approach now - that is more explicit and more easy to understand (though a bit more work). The right and more explicit thing to do is have the user flatten the inner two parts of the list with the Apply Rules tool or some other collection operation that we could potentially add.
The thing we still definitely need though - is to be able to map a collection over a multi-input element - so right now there is one collection button for a multi-input element and that button reduces the collection (treats it as a set of datasets). There should instead be two buttons - one that does that and one that operates like the collection button on single inputs - and creates a job per dataset in the collection (and similar mapping semantics). There is no workaround for that and it is repeatedly requested.
@mvdbeek do you agree with this? I suspect it is a conclusion you reached quicker than me.
That's a good summary, I agree completely.
here should instead be two buttons - one that does that and one that operates like the collection button on single inputs - and creates a job per dataset in the collection (and similar mapping semantics). There is no workaround for that and it is repeatedly requested.
<3, that would avoid this terrible conditional that asks if you want to reduce or not.