nfdi4plants/ARC-specification

Same Input/Output over the ARC

Closed this issue · 4 comments

Let's assume a Input/Output called Sample_1.

  1. Am i allowed to reference Sample_1 multiple times over different assays?
  • Assay_1, output first table is Sample_1.
  • Assay_2, output first table is Sample_1.

Is this valid? If not, should it be valid? As this could represent a form of "pooling" the input of the tables from Assay_1 and Assay_2 into the same sample.

  1. Am i allowed to reference Sample_1 multiple times over different studies?

Same example.

Both points (1+2) should be allowed in my opinion.

This and #66 points IMO a lot to proper use of PIDs, which in an ARC is basically (or virtually) given via URL to the (referenced) file. And I would suggest to enforce as little as possible. Especially, as long as there is no automated control / warning. But even then, if I were to reuse samples with same name from another ARC, I'd have to rename, which I would not want to.

In other words, it would probably be best to not reference a file (e.g. in dataset) just by the name, but by using the (absolute or relative) path to the file (./dataset/filename.csv, ../anotherAssay/dataset/filname).

Being able to exactly reference a file also emphasizes one strength of the ARC over using pure ISA.
One could consider not allowing any inputs / outputs that do not reference an existing file. (e.g. source names could point to descriptor files), but then again hard requirement, which we should probably avoid.

Sufficiently covered with

Source Names, Sample Names, Extract Names and Labeled Extract Names MUST be unique across an ARC. If two of these entities with the same name exist in the same ARC, they are considered the same entity.

?

Will close for now. Feel free to reopen if ambiguity issues arise.