genome/bam-readcount

Reason for low coverage output for deepseq data

YashaNazirB opened this issue · 5 comments

Hi

I am running the following code for my deepseq data analysis to extract BAM readcounts. Although my expected coverage is somewhat between 100,000 to 350,000 however, after I run the following code, my output shows very low coverage for deepseq genomic positions which is around 9000 max. Why am I getting this low coverage? I even defined the -d to be 1000000000.

bam-readcount -w0 -f mm9.fa -d1000000000 aligned_sorted_UNGKO_stim.bam |awk -F ":|\t|=" 'BEGIN {OFS = "\t"}; {print $1, $2, $3 , $4, $21 , $35, $49 , $63}' > BRC_UNGKO_stim.txt

According to this issue, specifying a region (and reference) is a workaround for this problem: #3 (Counting every base in the entire reference is going to be somewhat slow anyway, so splitting may sometimes be a good idea)

What happens if you run bam-readcount -w0 -f mm9.fa -d1000000000 aligned_sorted_UNGKO_stim.bam chr1:1-999999999 (or actually substitute the length of the chr)

Do you have a small example bam file that you can post, along with a command that reproduces the problem?

Commands that I tried are as follows:

bam-readcount -w0 -f mm9.fa -d1000000000 aligned_sorted_UNGKO_stim.bam chr12:114663740-114595637 |awk -F ":|\t|=" 'BEGIN {OFS = "\t"}; {print $1, $2, $3 , $4, $21 , $35, $49 , $63}' > BRC_UNGKO_stimtest.txt

bam-readcount -w0 -f mm9.fa -d1000000000 aligned_sorted_UNGKO_stim.bam chr12:114663740-114595637 > BRC_UNGKO_stimtest1.txt

Following is the link to the bam file:
https://drive.google.com/file/d/1X4qbqChdAIXDXtcKnbUOHJ44sGmBdePP/view?usp=sharing

I am using mouse mm9 genome as the reference and the coordinates of interest for deepseq analysis are chr12:114663740-114595637.

Best regards,
Yasha

Okay, I can reproduce this error. Thanks for the bug report - we'll look into it.

$ samtools index aligned_sorted_UNGKO_stim.bam

$ docker run -v /tmp:/tmp -v /Users/cmiller/annotations:/annotations --entrypoint /bin/bash -it mgibio/bam-readcount

$ bam-readcount -w 0 -f /annotations/tmp.fa aligned_sorted_UNGKO_stim.bam chr1:3603112-3603112
Minimum mapping quality is set to 0
chr1	3603112	c	10	=:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00	A:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00	C:9:0.00:35.33:0.00:9:0:0.00:0.01:40.67:9:0.99:150.00:0.99	G:1:0.00:32.00:0.00:1:0:0.00:0.02:106.00:1:0.99:150.00:0.99	T:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00	N:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00

# bam-readcount -w 0 -f /annotations/tmp.fa aligned_sorted_UNGKO_stim.bam chr12:114663740-114595637  >out.rct
Minimum mapping quality is set to 0
[E::sam_itr_next] Null iterator

@apldx - can you take a look and see what's up? FWIW, the tmp.fa in there is just mm9 with "chr" prefixes added. I dropped bam and fasta to use on the local filesystem here for debugging purposes: /storage1/fs1/timley/Active/aml_ppg/analysis/tmp/bam_readcount/issue_85/