Grice-Lab/AlignerBoost

MQ now most commonly 250

Opened this issue · 4 comments

Hi,

thanks for an interesting approach.

I tried this on a normal BAM which I had aligned with BWA.

I ran alignerboost with

java -Xmx50g -jar ~/NAS01/programs/AlignerBoost/AlignerBoost.jar run filterPE -in xx_R1_s.bam -out xx_R1_s_alignerboost.bam --sort-method coordinate -v

@RG ID:xx_R1 SM:xx_R1.stuff @PG ID:bwa PN:bwa VN:0.7.15-r1142-dirty SM:xx_R1.stuff /home/bioinformatics/NAS01/bioinformatics/seqres/ZR/bwa0X/SBKv7.fasta xx_R1.fastq.gz xx_R2.fastq.gz CL:bwa mem -t 16 -M -R @RG @PG ID:AlignerBoost PN:AlignerBoost VN:v1.6.2 CL:-in xx_R1_s.bam -out xx_R1_s_alignerboost.bam --sort-method coordinate -v

On assessing with qualimap, I found the original BAM had an average MQ of ~46, and the AlignerBoost bam had an average MQ of 246. Most Alignerboost MQs are now 250 as far as I have looked.

Can you please explain this, is this behaviour expected, and are users expected to filter BAMs by custom AlignerBoost BAM tags ? Why not just correct the MQ directly ?

Thanks,
Colin

Hi Colin,

Thank you for being interested in AlignerBoost. The main difference before
and after filtering is actually the "MAX_MAPQ" values pre-defined by both
software.

The "MAX_MAPQ" values are required for (nearly) uniquely mapped reads,
which should have a phred-scale score of infinite or very large (mapping
error ~= 0). However, SAM/BAM specification doesn't allow scores > 255, and
score 255 indicates not available. Thus different aligners use different
MAX_MAPQ scores to replace the infinity, while conservative tools (such as
BWA) uses a smaller MAX_MAPQ (BWA-MEM = 60) due to its empirical nature.
AlignerBoost bench-marked (unpublished) some different MAX_MAPQ thresholds,
and a much higher MAX_MAPQ still makes difference for results, so we picked
250.

So for a better comparison, you need to first compare the proportion of <60
vs. = 60 for BWA-MEM and < 250 vs. = 250 for AlignerBoost. Than compare the
distribution of the remaining. Please let me know your results, than you!

Best
Qi

On Thu, Oct 13, 2016 at 9:46 AM, colindaven notifications@github.com
wrote:

Hi,

thanks for an interesting approach.

I tried this on a normal BAM which I had aligned with BWA.

I ran alignerboost with

java -Xmx50g -jar ~/NAS01/programs/AlignerBoost/AlignerBoost.jar run
filterPE -in xx_R1_s.bam -out xx_R1_s_alignerboost.bam --sort-method
coordinate -v

@rg ID:xx_R1 SM:xx_R1.stuff
@pg ID:bwa PN:bwa VN:0.7.15-r1142-dirty SM:xx_R1.stuff
/home/bioinformatics/NAS01/bioinformatics/seqres/ZR/bwa0X/SBKv7.fasta
xx_R1.fastq.gz xx_R2.fastq.gz CL:bwa mem -t 16 -M -R @rg
@pg ID:AlignerBoost PN:AlignerBoost VN:v1.6.2 CL:-in xx_R1_s.bam -out
xx_R1_s_alignerboost.bam --sort-method coordinate -v

On assessing with qualimap, I found the original BAM had an average MQ of
~46, and the AlignerBoost bam had an average MQ of 246. Most Alignerboost
MQs are now 250 as far as I have looked.

Can you please explain this, is this behaviour expected, and are users
expected to filter BAMs by custom AlignerBoost BAM tags ? Why not just
correct the MQ directly ?

Thanks,
Colin


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#1, or mute the thread
https://github.com/notifications/unsubscribe-auth/AG-ecUwYtFlNyqj9Howzf0e-_HLa4JLfks5qzjYngaJpZM4KV6BI
.

Qi Zheng, Ph.D.
Bioinformatician of Dermatology, Perelman School of Medicine,University of
Pennsylvania
421 Curie Blvd
BRB II/III 1046-7
Philadelphia, PA 19104 http://goo.gl/maps/F4FHR
Office: +1 (215)-898-0173

Hi Collin,

Forgot to mention another problem. AlignerBoost doesn't help (at all) for
paired-end (PE) read mapping aligned by BWA-MEM. To date, BWA-MEM doesn't
allow reporting > 1 alignments for PE-mapping, and there is nothing to
improve or "boost" by AlignerBoost. Please see the published article for
details. You can still run it, but the result won't even change. It is
recommended to use the "best practice" pipeline by using the README and
config files in examples. For PE-mapping, we recommand bowtie2 &
AlignerBoost. for SE-mapping, most aligners are OK.

Best
Qi

On Thu, Oct 13, 2016 at 10:12 AM, Zheng Qi e00011027@gmail.com wrote:

Hi Colin,

Thank you for being interested in AlignerBoost. The main difference before
and after filtering is actually the "MAX_MAPQ" values pre-defined by both
software.

The "MAX_MAPQ" values are required for (nearly) uniquely mapped reads,
which should have a phred-scale score of infinite or very large (mapping
error ~= 0). However, SAM/BAM specification doesn't allow scores > 255, and
score 255 indicates not available. Thus different aligners use different
MAX_MAPQ scores to replace the infinity, while conservative tools (such as
BWA) uses a smaller MAX_MAPQ (BWA-MEM = 60) due to its empirical nature.
AlignerBoost bench-marked (unpublished) some different MAX_MAPQ thresholds,
and a much higher MAX_MAPQ still makes difference for results, so we picked
250.

So for a better comparison, you need to first compare the proportion of
<60 vs. = 60 for BWA-MEM and < 250 vs. = 250 for AlignerBoost. Than compare
the distribution of the remaining. Please let me know your results, than
you!

Best
Qi

On Thu, Oct 13, 2016 at 9:46 AM, colindaven notifications@github.com
wrote:

Hi,

thanks for an interesting approach.

I tried this on a normal BAM which I had aligned with BWA.

I ran alignerboost with

java -Xmx50g -jar ~/NAS01/programs/AlignerBoost/AlignerBoost.jar run
filterPE -in xx_R1_s.bam -out xx_R1_s_alignerboost.bam --sort-method
coordinate -v

@rg ID:xx_R1 SM:xx_R1.stuff
@pg ID:bwa PN:bwa VN:0.7.15-r1142-dirty SM:xx_R1.stuff
/home/bioinformatics/NAS01/bioinformatics/seqres/ZR/bwa0X/SBKv7.fasta
xx_R1.fastq.gz xx_R2.fastq.gz CL:bwa mem -t 16 -M -R @rg
@pg ID:AlignerBoost PN:AlignerBoost VN:v1.6.2 CL:-in xx_R1_s.bam -out
xx_R1_s_alignerboost.bam --sort-method coordinate -v

On assessing with qualimap, I found the original BAM had an average MQ of
~46, and the AlignerBoost bam had an average MQ of 246. Most Alignerboost
MQs are now 250 as far as I have looked.

Can you please explain this, is this behaviour expected, and are users
expected to filter BAMs by custom AlignerBoost BAM tags ? Why not just
correct the MQ directly ?

Thanks,
Colin


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#1, or mute the thread
https://github.com/notifications/unsubscribe-auth/AG-ecUwYtFlNyqj9Howzf0e-_HLa4JLfks5qzjYngaJpZM4KV6BI
.

Qi Zheng, Ph.D.
Bioinformatician of Dermatology, Perelman School of Medicine,University of
Pennsylvania
421 Curie Blvd
BRB II/III 1046-7
Philadelphia, PA 19104 http://goo.gl/maps/F4FHR
Office: +1 (215)-898-0173

Qi Zheng, Ph.D.
Bioinformatician of Dermatology, Perelman School of Medicine,University of
Pennsylvania
421 Curie Blvd
BRB II/III 1046-7
Philadelphia, PA 19104 http://goo.gl/maps/F4FHR
Office: +1 (215)-898-0173

Thanks for the very clear help and notes.

Ok, bwa mem would have been my main use case, but bowtie2 is important too, i.e. for single ended reads and maybe short reads like miRNA. Another important use case would be RNA-seq (mainly Paired end) mapping with STAR. In my experience RNA-seq mappers don't do a good job with the admittedly very complex job of assigning MQ to read (fragments).

Do you have any comment on RNA-seq correction ?

Thanks,
Colin

Hi Colin,

You are absolutely correct that current RNA-seq aligners don't do as good
(as optimizing their default options) as the DNA-seq aligners. In fact STAR

Basically you can get > 99% precision and > 98% sensitivity if you set your
options correctly in the config file. Remember, RNA-seq inserts (cDNA
fragments) are often short ( < 100 bp), and thus the trimming step is
recommended. Fill the config file with the correct 3'-adapter and
5'-adapter sequences of your kit of choice. Do use the correct adapter
sequences (reverse-complement of the primers).

Best
Qi

On Fri, Oct 14, 2016 at 7:41 AM, colindaven notifications@github.com
wrote:

Thanks for the very clear help and notes.

Ok, bwa mem would have been my main use case, but bowtie2 is important
too, i.e. for single ended reads and maybe short reads like miRNA. Another
important use case would be RNA-seq (mainly Paired end) mapping with STAR.
In my experience RNA-seq mappers don't do a good job with the admittedly
very complex job of assigning MQ to read (fragments).

Do you have any comment on RNA-seq correction ?

Thanks,
Colin


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#1 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AG-ecRofaFlQv3AluaJ56sUF7s_kpK4iks5qz2pVgaJpZM4KV6BI
.

Qi Zheng, Ph.D.
Bioinformatician of Dermatology, Perelman School of Medicine,University of
Pennsylvania
421 Curie Blvd
BRB II/III 1046-7
Philadelphia, PA 19104 http://goo.gl/maps/F4FHR
Office: +1 (215)-898-0173