are soft-clipped bases counted for the depth?
Closed this issue · 7 comments
Hi,
I think that the tittle is self-explanatory :).
Thank you in advance
No, they are not.
Thank you very much :)
I had this question because I'm trying to compare the counts I get from bam-readcounts with the counts of samtools view.
1 1 A 79 =:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00 A:79:59.72:27.34:59.72:35:44:0.00:0.00:0.00:0:0.00:0.00:0.00 C:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00 G:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00 T:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00 N:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00
samtools view -F 0x4 -c HLA_mapped01.bam "1:1-1"
89
samtools view -F 0x4 HLA_mapped01.bam "1:1-1" | cut -f1 | sort -u | wc -l
85
Any clue about why I'm getting 10 more alignments in samtools or 6 more reads in comparison with the results using bam-readcount?
Thank you again,
bam-readcount excludes secondary, qc failed and duplicate reads from the resulting counts. A corresponding view command would be: samtools view -F 0x1796
If I use your flag I get fair less alignments...:
samtools view -F 0x1796 -c HLA_mapped01.bam "1:1-1"
35
I think I goofed and the value should be 1796 in decimal. Hopefully either samtools view -F 1796
or equivalently samtools view -F 0x704
would be closer.
Ok now I'm getting closer values to almost all bam-readcount counts.. :). So you're applying " read fails platform/vendor quality checks" filter in the sam flag. Could you clarify for me what does this mean and why are you applying it? I've never used it before :)
Thank you again
Glad to know that we're a bit closer. Please keep in mind that bam-readcount is reporting per-base counts and as such may not match samtools view output if there are, for example, gaps in the alignments at a given position or you start to apply per-base quality cutoffs. I don't think the gap issue is relevant for the view commands you used above since the position is the first position of the chromosome.
I haven't seen much use of the "read fails platform/vendor quality checks" flag in the wild, but when I have seen it used it was indicating reads that failed Illumina's "chastity" filter. Since these are generally lower quality reads, we've typically ignored them in the past as being low quality. I believe most, if not all, sequencing vendors do not even include such reads in their fastq/BAM files.