How to use --read-geometry when specifying start and end of the both cDNA read pairs?
Opened this issue · 0 comments
Hi Salmon team,
First of all, thank you so much for building a fantastic light weight alignment and quantification tool "Salmon & AlevinFry".
I have question regarding the use of --read-geometry parameter in salmon alevin:
I have scRNA data where it is a paired-end sequencing data with barcode [16] and umi [10] is attached to cDNA read1. I am trying to use salmon but I am confused on how to specify the parameter for read geometry.
I tried two ways and got different statistics each time and thus a clarification from you will be very helpful.
Run1 command: salmon alevin -i hg38_splici_idx_RL_75/ -p 16 -l A --sketch -1 Read1.fastq.gz -2 Read2.fastq.gz -o output --tgMap transcriptome_splici_fl70_t2g.tsv --noDedup --bc-geometry 1[1-16] --umi-geometry 1[17-27] --read-geometry 1[28-end],2[1-end]
Output:
[2024-09-11 16:17:20.192] [jointLog] [info] Number uniquely mapped : 4856777
[2024-09-11 16:17:20.390] [jointLog] [info] Computed 0 rich equivalence classes for further processing
[2024-09-11 16:17:20.390] [jointLog] [info] Counted 0 total reads in the equivalence classes
[2024-09-11 16:17:20.390] [jointLog] [info] Selectively-aligned 10466950 total fragments out of 159997960
[2024-09-11 16:17:20.390] [jointLog] [info] Number of fragments discarded because they are best-mapped to decoys : 0
Run2 command: salmon alevin -i hg38_splici_idx_RL_75/ -p 16 -l A --sketch -1 Read1.fastq.gz -2 Read2.fastq.gz -o output --tgMap transcriptome_splici_fl70_t2g.tsv --noDedup --bc-geometry 1[1-16] --umi-geometry 1[17-27] --read-geometry 1[28-end] 2[1-end]
Output:
[2024-09-11 16:39:31.848] [jointLog] [info] Number uniquely mapped : 53335563
[2024-09-11 16:39:31.964] [jointLog] [info] Computed 0 rich equivalence classes for further processing
[2024-09-11 16:39:31.964] [jointLog] [info] Counted 0 total reads in the equivalence classes
[2024-09-11 16:39:31.965] [jointLog] [info] Selectively-aligned 134356089 total fragments out of 159997960
[2024-09-11 16:39:31.965] [jointLog] [info] Number of fragments discarded because they are best-mapped to decoys : 0
Can you please help me understand which method is correct where Salmon is correctly reading pairs with specified start and end as in Run2 the uniquely mapped reads are ~53 million and ~4 million in Run1. Also, I get "Counted 0 total reads in the equivalence classes" for both cases and is it normal?
Thank you!