nf-core/rnaseq

Improve qualimap execution time

Opened this issue · 2 comments

Description of feature

The qualimap tool sorts the input bam file by read-name and does so in a single-threaded manner. On a dataset that I'm using, the total execution time of the tool is around 85 mins. However, if we perform the sorting via samtools and then call qualimap on this (with the additional flag --sorted), the execution time drops to around 33 mins. Combined with the 7 mins it took for sorting (with 16 threads), the total execution time for the workflow reduces to less than half its original time.

If this change is fine, I could go ahead and implement it. Thanks!

I think, that change is perfectly fine!

There is even already a name sorting happening for the UMI deduplication route, at least for the transcriptome alignments:

    if (params.with_umi) {
        process {
            withName: 'NFCORE_RNASEQ:RNASEQ:SAMTOOLS_SORT' {
                ext.args   = '-n'
                ext.prefix = { "${meta.id}.umi_dedup.transcriptome" }
                publishDir = [
                    path: { params.save_align_intermeds || params.save_umi_intermeds ? "${params.outdir}/${params.aligner}" : params.outdir },
                    mode: params.publish_dir_mode,
                    pattern: '*.bam',
                    saveAs: { params.save_align_intermeds || params.save_umi_intermeds ? it : null }
                ]
            }
            // Name sort BAM before passing to Salmon
            SAMTOOLS_SORT (
                BAM_DEDUP_STATS_SAMTOOLS_UMITOOLS_TRANSCRIPTOME.out.bam,
                ch_fasta.map { [ [:], it ] }
            )

Thanks @MatthiasZepper. I'll work on it