Reason for low coverage output for deepseq data

Question

Reason for low coverage output for deepseq data

YashaNazirB opened this issue 3 years ago · 5 comments

Hi

I am running the following code for my deepseq data analysis to extract BAM readcounts. Although my expected coverage is somewhat between 100,000 to 350,000 however, after I run the following code, my output shows very low coverage for deepseq genomic positions which is around 9000 max. Why am I getting this low coverage? I even defined the -d to be 1000000000.

bam-readcount -w0 -f mm9.fa -d1000000000 aligned_sorted_UNGKO_stim.bam |awk -F ":|\t|=" 'BEGIN {OFS = "\t"}; {print $1, $2, $3 , $4, $21 , $35, $49 , $63}' > BRC_UNGKO_stim.txt

Answer 1 · 2021-09-25T04:44:45.000Z

According to this issue, specifying a region (and reference) is a workaround for this problem: #3 (Counting every base in the entire reference is going to be somewhat slow anyway, so splitting may sometimes be a good idea)

What happens if you run bam-readcount -w0 -f mm9.fa -d1000000000 aligned_sorted_UNGKO_stim.bam chr1:1-999999999 (or actually substitute the length of the chr)

Answer 2 · 2021-09-26T22:22:23.000Z

Hey Chris, I tried selecting a defined region but it did not generate any output. Output was 0 bytes. How should I proceed?

…

-Yasha

________________________________ From: Chris Miller ***@***.***> Sent: Saturday, September 25, 2021 12:44 AM To: genome/bam-readcount ***@***.***> Cc: Yasha Nazir Butt ***@***.***>; Author ***@***.***> Subject: Re: [genome/bam-readcount] Reason for low coverage output for deepseq data (#85) [EXTERNAL] According to this issue, specifying a region (and reference) is a workaround for this problem: #3<#3> (Counting every base in the entire reference is going to be somewhat slow anyway, so splitting may sometimes be a good idea) What happens if you run bam-readcount -w0 -f mm9.fa -d1000000000 aligned_sorted_UNGKO_stim.bam chr1:1-999999999 (or actually substitute the length of the chr) — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub<#85 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AM3AU2XQP2OXQZ5WH5KROI3UDVHUPANCNFSM5EW7UH7A>. Triage notifications on the go with GitHub Mobile for iOS<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

Answer 3 · 2021-09-26T22:24:17.000Z

Do you have a small example bam file that you can post, along with a command that reproduces the problem?

Answer 4 · 2021-09-26T23:28:33.000Z

Commands that I tried are as follows:

bam-readcount -w0 -f mm9.fa -d1000000000 aligned_sorted_UNGKO_stim.bam chr12:114663740-114595637 |awk -F ":|\t|=" 'BEGIN {OFS = "\t"}; {print $1, $2, $3 , $4, $21 , $35, $49 , $63}' > BRC_UNGKO_stimtest.txt

bam-readcount -w0 -f mm9.fa -d1000000000 aligned_sorted_UNGKO_stim.bam chr12:114663740-114595637 > BRC_UNGKO_stimtest1.txt

Following is the link to the bam file:
https://drive.google.com/file/d/1X4qbqChdAIXDXtcKnbUOHJ44sGmBdePP/view?usp=sharing

I am using mouse mm9 genome as the reference and the coordinates of interest for deepseq analysis are chr12:114663740-114595637.

Best regards,
Yasha

Answer 5 · 2021-09-27T02:59:48.000Z

Okay, I can reproduce this error. Thanks for the bug report - we'll look into it.

$ samtools index aligned_sorted_UNGKO_stim.bam

$ docker run -v /tmp:/tmp -v /Users/cmiller/annotations:/annotations --entrypoint /bin/bash -it mgibio/bam-readcount

$ bam-readcount -w 0 -f /annotations/tmp.fa aligned_sorted_UNGKO_stim.bam chr1:3603112-3603112
Minimum mapping quality is set to 0
chr1	3603112	c	10	=:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00	A:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00	C:9:0.00:35.33:0.00:9:0:0.00:0.01:40.67:9:0.99:150.00:0.99	G:1:0.00:32.00:0.00:1:0:0.00:0.02:106.00:1:0.99:150.00:0.99	T:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00	N:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00

# bam-readcount -w 0 -f /annotations/tmp.fa aligned_sorted_UNGKO_stim.bam chr12:114663740-114595637  >out.rct
Minimum mapping quality is set to 0
[E::sam_itr_next] Null iterator

@apldx - can you take a look and see what's up? FWIW, the tmp.fa in there is just mm9 with "chr" prefixes added. I dropped bam and fasta to use on the local filesystem here for debugging purposes: /storage1/fs1/timley/Active/aml_ppg/analysis/tmp/bam_readcount/issue_85/