
DSL2: Genotyping on multiple snp sets in one run?

TCLamnidis opened this issue · 1 comments

It might be nice to be able to genotype on multiple SNP sets in a single run. I'm specifically thinking of pileupcaller here, not sure how it would apply to other genotypers, but:

Currently, the reference sheet takes one pileupcaller_{bed,snp} per reference. That means that if one wanted to genotype on two sets of positions, they would need to run the entire pipeline twice, or duplicate a row in the reference sheet just for that additional genotyping. Now, since the latter option will not fly with the ref-sheet validation, one would have to "fake" an entire new reference, thus duplicating all the processing, just for the extra genotypes.

Maybe we can turn the pipleupcaller_bed/snp columns into a list column, e.g. multiple files separated by ;, that would then get split into separate channel elements with the same meta, and thus only duplicate the genotyping step?

Something like:


      x -> 
      def y=x.split(';')


These can then be separately input into genotyping and produce their own genotypes, or get catted to produce one superset?