Sambamba should be able to convert SAM->BAM and sort at the same time like samtools
moldach opened this issue · 1 comments
has a number of functions which are reported to be quicker than counterparts.
I'm trying to exchange the following functions (view
, sort
, markdup
& index
) in our pipeline for faster alternatives.
While most of the functions appear faster from my benchmarks the use of view
raises some concern. It appears that the critical difference between Samtools
and Sambamba
seems to be the first step in the pipeline - as samtools sort
both sort
's and converts SAM -> BAM
(the typical job of view
I'm wondering if this is not also possible with `Sambamba, as it appears to be bottle-neck.
Old Pipeline (Samtools
& Picard
# Convert SAM to BAM & Sort
./samtools-1.3.1/samtools sort -@ 8 -o proband_bwaMEM_sort.bam proband_bwaMEM.sam
# Markdups
java -Xmx4G -jar picard.jar MarkDuplicates \
I=proband_bwaMEM_sort.bam \
O=proband_bwaMEM_sort_dedupped.bam \
# Samtools index
./samtools-1.3.1/samtools index proband_bwaMEM_sort_dedupped.bam;
Samamba Implementations
# sam -> bam
sambamba-0.8.0-linux-amd64-static view -S proband_bwaMEM.sam \
-f bam \
-t 8 \
-o proband_tmp.bam
# sort
sambamba-0.8.0-linux-amd64-static sort \
-t 32 \
proband_tmp.bam \
-o proband_bwaMEM_sort_sambamba.bam
# MarkDuplicates
sambamba-0.8.0-linux-amd64-static markdup \
-t 32 \
--overflow-list-size 800000 \
proband_bwaMEM_sort_sambamba.bam \
# Sambamba index
sambamba-0.8.0-linux-amd64-static index \
-t 32 \
proband_bwaMEM_sort_dedupped_sambamba.bam \
Samtool/Picard | Time | Sambamba | Time |
View/Sort | 01:22:35 | View | 01:21:58 |
Sort | 00:46:01 | ||
Markdup | 04:16:02 | Markdup | 00:48:04 |
Index | 00:32:59 | Index | 00:15:48 |
06:11:46 | 03:11:57 |
Although the overall time for markdup
and index
is greatly improved I found that with playing around with the number of cores for sambamba view
and sambamba sort
(8, 16, & 32 cores) that their speed, even at the optimum number of cores, was slower than the samtools
samtools sort -@ 8 -o proband_bwaMEM_sort.bam proband_bwaMEM.sam
This is not a bug. You can see the same in earlier speed tests. But thanks for reporting.