Why are bwa and BWA-MEME results inconsistent?
yukaiquan opened this issue · 4 comments
Dear developer:
bwa: Version: 0.7.17-r1188
BWA-MEME:v1.0.5
bwa stat of bam:
338883556 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
448166 + 0 supplementary
0 + 0 duplicates
330144737 + 0 mapped (97.42% : N/A)
338435390 + 0 paired in sequencing
169217695 + 0 read1
169217695 + 0 read2
322879394 + 0 properly paired (95.40% : N/A)
329460362 + 0 with itself and mate mapped
236209 + 0 singletons (0.07% : N/A)
5641738 + 0 with mate mapped to a different chr
2394586 + 0 with mate mapped to a different chr (mapQ>=5)
BWA-MEME stat of bam:
338883548 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
448158 + 0 supplementary
0 + 0 duplicates
330144743 + 0 mapped (97.42% : N/A)
338435390 + 0 paired in sequencing
169217695 + 0 read1
169217695 + 0 read2
322879718 + 0 properly paired (95.40% : N/A)
329460388 + 0 with itself and mate mapped
236197 + 0 singletons (0.07% : N/A)
5641548 + 0 with mate mapped to a different chr
2394572 + 0 with mate mapped to a different chr (mapQ>=5)
Hi yukaiquan,
Thank you for trying out and reporting the issue.
There are randomness within BWA, BWA-MEM2, BWA-MEME due to chunk size that changes according to the number of threads. bwa 1 bwa mem2 bwa mem2
e.g., chunk (batch) statistics are used for paired mapping
Have you tried comparing the output using a fixed chunk size?
- you can set a fixed chunk size using
-K
option
# Perform alignment with BWA-MEME, add -7 option
bwa-meme mem -7 -Y -K 100000000 -t <num_threads> <input.fasta> <input_1.fastq> -o <output_meme.sam>
# Below runs alignment with BWA-MEM2, without -7 option
bwa-meme mem -Y -K 100000000 -t <num_threads> <input.fasta> <input_1.fastq> -o <output_mem2.sam>
# Compare output SAM files
diff <output_mem2.sam> <output_meme.sam>
# To diff large SAM files use https://github.com/unhammer/diff-large-files
Thanks!
Hi quito418:
Thank you very much for your patient explanation, the results are consistent after adding -K.
Can the index be loaded only once when comparing thousands of samples in batches? Reading the index takes a lot of time.
Thanks!
Glad to hear it worked :)
At the moment, we have not developed a method for loading index once and reusing the loaded index.
Below are my suggestions that can be applied now:
- use linux disk cache (when you read/write file the file is cached in the ram at default). Hence if you run BWA-MEME sequentially in a same linux machine, the next time index is read, it will be loaded from the memory (which is 3-5 GB/sec in IO speed)
- use RAM disk. e.g., you may put the index files in the
/dev/shm
(~40GB for indexes required at runtime). This is similar to first method.
Thanks!
Best wishes!