Issue with dedup UMI reads
Opened this issue · 0 comments
Is the bug primarily related to salmon (bulk mode) or alevin (single-cell mode)?
It is related to salmon when used after umi tools deduplication.
Describe the bug
Once deduplicated the bams with umi_tools (and previously sorted by coordinates) salmon gives the following warning: WARNING: Detected suspicious pair ---
The names are different:
read1 : XXXXXXXXX-YYYY:YYYY
read2 : XXXXXXXXX-ZZZZ:ZZZZ
To Reproduce
The used code is the following:
"salmon_version": "1.10.1",
"targets": "../index/hg38/salmon/gencode.v45.transcripts.fa",
"libType": "A",
"seqBias": [],
"gcBias": [],
"posBias": [],
"threads": "16",
"dumpEq": [],
"numBootstraps": "50",
"alignments": "/media/storage/work/iiglesia/rnaseq/RESULTS/BAM_STAR_dedup/17932763_S28_Aligned_dedup.sorted.bam",
"output": "./aligned_salmon/17932763_S28",
"geneMap": "../index/hg38/salmon/mart_export.txt",
"gencode": [],
When I have treid to use the same sample without deduplication and sorting salmon works good.
Specifically, please provide at least the following information:
- Which version of salmon was used? v1.10.1
- How was salmon installed (compiled, downloaded executable, through bioconda)? compiled
- Which reference (e.g. transcriptome) was used? gencode.v45.transcripts.fa
- Which read files were used? BAM files
- Which which program options were used? it can be seen above
What I expect
I want to know how could I use salmon after umi_tools or to account for UMI deduplication
Desktop (please complete the following information):
- OS: Ubuntu
- Version: 22.04