update logic to handle multiple data files as input for reads qc
Closed this issue · 0 comments
Berkeley schema will allow for multiple runs from the same library which will be analyzed together. This will break the assumptions embedded about how to handle an array of length 1 or > 2. The code currently assumes that if length=1 then the file is interleaved and if it is > 1 it is not. We need to update the wdl/input.json such that it could handle the following use cases:
1 interleaved file
1+n interleaved files
2 files, one for read 1 & one for read 2
2+n(even only)
My suggestion for how to handle this is to update the format of the input json
to look something like this
rqcfilter.input_files:[{'interleaved':"interleaved_1.fastq.gz"},{'interleaved':"interleaved_1.fastq.gz"}]
rqcfilter.input_files:[{"read_1":"interleaved_1_R1.fastq.gz","read_2":"interleaved_1_R1.fastq.gz"},{:read_1":"interleaved_2_R1.fastq.gz","read_2":"interleaved_2_R1.fastq.gz"}]