Include read demultiplexing barcode in the demultiplexed FASTQ read ID
cnluzon opened this issue · 1 comments
It could be useful for full reproducibility to keep also each read's barcode in the demultiplexed FASTQ file header, since we submit the demultiplexed files to GEO. Of course, we know the target barcode, but not necessarily the actual barcode the read had (i.e. it can be one or max_barcode_mismatches
away from the original).
This is not very prioritary but I assume it's a small change, probably just change the remove_contamination
rule the -u
parameter with -u {config[umi_length]}
to -u {config[umi_length] + config[barcode_length]}
or just take the barcode length from libraries.tsv
at first, and then --rename '{{id}}_{{r1.cut_prefix}} {{comment}}'
accordingly. If we keep the UMI in the same place it should not affect the rest of the processing.
Removing the barcode within the remove_contamination
rule is unfortunately not possible because then the demultiplexing rule won’t work anymore (because it needs the barcode).
I have just added a {match_sequence}
placeholder which can be used to insert the matched sequence (that is, the barcode as it is in the read). That would then be done within the demultiplexing rule. This feature has been requested by other Cutadapt users a while ago, so it’s good I had a reason to finally implement it. See marcelm/cutadapt#437.