COMBINE-lab/salmon

Issue with dedup UMI reads

Opened this issue · 0 comments

Is the bug primarily related to salmon (bulk mode) or alevin (single-cell mode)?
It is related to salmon when used after umi tools deduplication.

Describe the bug
Once deduplicated the bams with umi_tools (and previously sorted by coordinates) salmon gives the following warning: WARNING: Detected suspicious pair ---
The names are different:
read1 : XXXXXXXXX-YYYY:YYYY
read2 : XXXXXXXXX-ZZZZ:ZZZZ

To Reproduce
The used code is the following:

"salmon_version": "1.10.1",
"targets": "../index/hg38/salmon/gencode.v45.transcripts.fa",
"libType": "A",
"seqBias": [],
"gcBias": [],
"posBias": [],
"threads": "16",
"dumpEq": [],
"numBootstraps": "50",
"alignments": "/media/storage/work/iiglesia/rnaseq/RESULTS/BAM_STAR_dedup/17932763_S28_Aligned_dedup.sorted.bam",
"output": "./aligned_salmon/17932763_S28",
"geneMap": "../index/hg38/salmon/mart_export.txt",
"gencode": [],

When I have treid to use the same sample without deduplication and sorting salmon works good.

Specifically, please provide at least the following information:

  • Which version of salmon was used? v1.10.1
  • How was salmon installed (compiled, downloaded executable, through bioconda)? compiled
  • Which reference (e.g. transcriptome) was used? gencode.v45.transcripts.fa
  • Which read files were used? BAM files
  • Which which program options were used? it can be seen above

What I expect
I want to know how could I use salmon after umi_tools or to account for UMI deduplication

Desktop (please complete the following information):

  • OS: Ubuntu
  • Version: 22.04