Improve qualimap execution time
Opened this issue · 2 comments
Description of feature
The qualimap
tool sorts the input bam file by read-name and does so in a single-threaded manner. On a dataset that I'm using, the total execution time of the tool is around 85 mins. However, if we perform the sorting via samtools
and then call qualimap
on this (with the additional flag --sorted
), the execution time drops to around 33 mins. Combined with the 7 mins it took for sorting (with 16 threads), the total execution time for the workflow reduces to less than half its original time.
If this change is fine, I could go ahead and implement it. Thanks!
I think, that change is perfectly fine!
There is even already a name sorting happening for the UMI deduplication route, at least for the transcriptome alignments:
if (params.with_umi) {
process {
withName: 'NFCORE_RNASEQ:RNASEQ:SAMTOOLS_SORT' {
ext.args = '-n'
ext.prefix = { "${meta.id}.umi_dedup.transcriptome" }
publishDir = [
path: { params.save_align_intermeds || params.save_umi_intermeds ? "${params.outdir}/${params.aligner}" : params.outdir },
mode: params.publish_dir_mode,
pattern: '*.bam',
saveAs: { params.save_align_intermeds || params.save_umi_intermeds ? it : null }
]
}
// Name sort BAM before passing to Salmon
SAMTOOLS_SORT (
BAM_DEDUP_STATS_SAMTOOLS_UMITOOLS_TRANSCRIPTOME.out.bam,
ch_fasta.map { [ [:], it ] }
)
Thanks @MatthiasZepper. I'll work on it