COMBINE-lab/salmon

After using SAMtools to convert bam to fastq, the salmon quantification mapping rate is super low: is it normal?

KOBE24DUNK opened this issue · 0 comments

Hi,

Thank you for helping me with my issue. I'm not sure if the mapping rate so low is reasonable in this case: the fastq files (R1 and R2) were generated reservely from the bam files.

I tried both Hg38 and Hg19 for this dataset (only bam files available to me, which were aligned by Hg19 genome), and I got similarly low mapping rate.

The salmon log is like this:

[2024-01-27 01:09:31.030] [jointLog] [info] setting maxHashResizeThreads to 20
[2024-01-27 01:09:31.030] [jointLog] [info] Fragment incompatibility prior below threshold.  Incompatible fragments will be ignored.
[2024-01-27 01:09:31.030] [jointLog] [info] Usage of --validateMappings implies use of minScoreFraction. Since not explicitly specified, it is being set to 0.65
[2024-01-27 01:09:31.030] [jointLog] [info] Setting consensusSlack to selective-alignment default of 0.35.
[2024-01-27 01:09:31.030] [jointLog] [info] parsing read library format
[2024-01-27 01:09:31.030] [jointLog] [info] There is 1 library.
[2024-01-27 01:09:31.032] [jointLog] [info] Loading pufferfish index
[2024-01-27 01:09:31.033] [jointLog] [info] Loading dense pufferfish index.
[2024-01-27 01:09:33.435] [jointLog] [info] done
[2024-01-27 01:09:33.508] [jointLog] [info] Index contained 252,048 targets
[2024-01-27 01:09:36.263] [jointLog] [info] Number of decoys : 0
[2024-01-27 01:09:41.237] [jointLog] [info] Automatically detected most likely library type as IU

[2024-01-27 01:10:28.189] [fileLog] [info] 
At end of round 0
==================
Observed 3600210 total fragments (3600210 in most recent round)

[2024-01-27 01:10:28.188] [jointLog] [info] Computed 179,584 rich equivalence classes for further processing
[2024-01-27 01:10:28.188] [jointLog] [info] Counted 446,871 total reads in the equivalence classes 
[2024-01-27 01:10:28.202] [jointLog] [warning] 0.197488% of fragments were shorter than the k used to build the index.
If this fraction is too large, consider re-building the index with a smaller k.
The minimum read size found was 20.


[2024-01-27 01:10:28.202] [jointLog] [info] Number of mappings discarded because of alignment score : 18,226,670
[2024-01-27 01:10:28.202] [jointLog] [info] Number of fragments entirely discarded because of alignment score : 762,980
[2024-01-27 01:10:28.202] [jointLog] [info] Number of fragments discarded because they are best-mapped to decoys : 0
[2024-01-27 01:10:28.202] [jointLog] [info] Number of fragments discarded because they have only dovetail (discordant) mappings to valid targets : 82,701
[2024-01-27 01:10:28.219] [jointLog] [warning] Only 446871 fragments were mapped, but the number of burn-in fragments was set to 5000000.
The effective lengths have been computed using the observed mappings.

[2024-01-27 01:10:28.219] [jointLog] [info] Mapping rate = 12.4124%

[2024-01-27 01:10:28.219] [jointLog] [info] finished quantifyLibrary()
[2024-01-27 01:10:28.224] [jointLog] [info] Starting optimizer
[2024-01-27 01:10:28.368] [jointLog] [info] Marked 0 weighted equivalence classes as degenerate
[2024-01-27 01:10:28.376] [jointLog] [info] iteration = 0 | max rel diff. = 174.042
[2024-01-27 01:10:28.453] [jointLog] [info] iteration 11, adjusting effective lengths to account for biases
[2024-01-27 01:10:52.990] [jointLog] [info] Computed expected counts (for bias correction)
[2024-01-27 01:10:52.990] [jointLog] [info] processed bias for 0.0% of the transcripts
[2024-01-27 01:10:55.410] [jointLog] [info] processed bias for 10.0% of the transcripts
[2024-01-27 01:10:57.934] [jointLog] [info] processed bias for 20.0% of the transcripts
[2024-01-27 01:11:00.570] [jointLog] [info] processed bias for 30.0% of the transcripts
[2024-01-27 01:11:03.092] [jointLog] [info] processed bias for 40.0% of the transcripts
[2024-01-27 01:11:05.416] [jointLog] [info] processed bias for 50.0% of the transcripts
[2024-01-27 01:11:07.798] [jointLog] [info] processed bias for 60.0% of the transcripts
[2024-01-27 01:11:10.207] [jointLog] [info] processed bias for 70.0% of the transcripts
[2024-01-27 01:11:12.614] [jointLog] [info] processed bias for 80.0% of the transcripts
[2024-01-27 01:11:15.076] [jointLog] [info] processed bias for 90.0% of the transcripts
[2024-01-27 01:11:17.845] [jointLog] [info] processed bias for 100.0% of the transcripts
[2024-01-27 01:11:18.538] [jointLog] [info] iteration = 100 | max rel diff. = 3.26654
[2024-01-27 01:11:19.295] [jointLog] [info] iteration = 200 | max rel diff. = 0.0770813
[2024-01-27 01:11:20.053] [jointLog] [info] iteration = 300 | max rel diff. = 0.194297
[2024-01-27 01:11:20.809] [jointLog] [info] iteration = 400 | max rel diff. = 0.0224582
[2024-01-27 01:11:21.021] [jointLog] [info] iteration = 429 | max rel diff. = 0.00843674
[2024-01-27 01:11:21.067] [jointLog] [info] Finished optimizer
[2024-01-27 01:11:21.067] [jointLog] [info] writing output 

Is it normal for such converted fastq files?
Thanks a lot!

Originally posted by @KOBE24DUNK in #906