stats_summary.txt has duplicated input lines
cnluzon opened this issue · 0 comments
cnluzon commented
Input results get duplicated when same Input is used for different IPs (which is usually the case). It seems one gets as many lines with Input stats for a given barcode as this Input appears in groups.tsv
I need to look at it in more detail but I think that is the cause (or somehow related to that).
How to reproduce it with the current test data (minute-testdata-0.9
Duplicate testdata-H3K4m3_R{1,2}.fastq.gz
and name it something else: testdata-H3K27m3_R{1,2}.fastq.gz
Then append to the end of in libraries.tsv
H3K27m3_SL_CTR 1 CATGCTTA testdata-H3K27m3
H3K27m3_SL_CTR 2 GCACATCT testdata-H3K27m3
H3K27m3_2i_CTR 1 GGTCCAGA testdata-H3K27m3
H3K27m3_2i_CTR 2 GTATAACA testdata-H3K27m3
And groups.tsv
H3K27m3_SL_CTR pooled IN_SL_CTR group3 mini
H3K27m3_SL_CTR 1 IN_SL_CTR group3 mini
H3K27m3_2i_CTR 1 IN_2i_CTR group3 mini
H3K27m3_2i_CTR 2 IN_2i_CTR group3 mini
You'll get duplicated entries in the stats_summary.txt
for the input.
As a side note, Input gets interleaved, row order would be nicer per FASTQ file I think.