jiantao/Tangram

MOSAIK bam files for Tangram

Closed this issue · 2 comments

RPSeq commented

Jiantao,

I've been running Tangram on some test sequences aligned with MOSAIK using the '-sref' option (and the hg19 + moblist MEI reference). With this option, MOSAIK produces two bam files- one with reads that aligned to the special references, and one with all other alignments. For Tangram usage, should I only use the special.bam file, or should I merge the files and use all the aligned reads?

Thanks again,

Ryan

Hi Ryan,

For Tangram, you need to use the standard output BAM file, not the
special.bam and you don't need to merge the files. When aligning, Mosaik
also attempted to align all reads to the 'special' reference sequences.
These reference sequences do not appear in the BAM header and none of the
reads will be listed as having mapped to those sequences in the position
field in the BAM record. However, the reads that did map to the mobile
elements will have been included in the BAM file with the same coordinates
as the uniquely mapped mate (as opposed to just picking a random MEI
position anywhere in the genome) and the ZA tag at the end of the BAM
record will have been tagged with the fact that the alignment hit one of
these sequences (you can grep for L1 for example in the BAM file, and you
should see that some reads have L1 in the ZA tag at the end of the BAM
record). When Tangram reads this BAM file, it will look for fragment length
discordancy between read pairs, and situations where one mate is uniquely
mapped and the other mate maps to an MEI using the ZA tag and the fact that
the two reads will appear together in the BAM file. It will also attempt to
split read map reads across MEI breakpoints and collate all of these
signals.

I hope this helps.

Alistair Ward

On Wed, May 13, 2015 at 1:16 PM, Ryan Smith notifications@github.com
wrote:

Jiantao,

I've been running Tangram on some test sequences aligned with MOSAIK using
the '-sref' option (and the hg19 + moblist MEI reference). With this
option, MOSAIK produces two bam files- one with reads that aligned to the
special references, and one with all other alignments. For Tangram usage,
should I only use the special.bam file, or should I merge the files and use
all the aligned reads?

Thanks again,

Ryan


Reply to this email directly or view it on GitHub
#7.

RPSeq commented

Thanks for the info, everything seems to be working properly with my data.