epi2me-labs/wf-single-cell

Wrong strandness assumed for multiome v1?

Closed this issue · 12 comments

Operating System

Other Linux (please specify below)

Other Linux

Ubuntu 18.04

Workflow Version

v1.0.1-ga6a1b69

Workflow Execution

Command line

EPI2ME Version

No response

CLI command run

~/nextflow-23.12.0-edge-all run epi2me-labs/wf-single-cell
--fastq fastq/
--kit_name multiome
--kit_version v1
--expected_cells 5000
--ref_genome_dir /home/ubuntu/cellranger/GRCh38-2020-A/
--sample $SAMPLE
-c openstack.cfg
-profile standard

Workflow Execution - CLI Execution Profile

None

What happened?

Hello, we used the pipeline to map 10x multiome v1 ONT data. It seems that majority of reads were aligned assuming wrong strand that results in poor mapping of exon-exon junction since minimap2 tries to find GT-AG dinucleotides on a wrong strand.
Example below shows two ends of single intron (and that is how majority of introns looks like). The gene is on + strand, all reads colored in red (read strand is + according to IGV) missed right splicing sites (that are canonical GT-AG) and are mapped to intron with CT-AC sites that appears to be canonical sites but on the opposite strand. Only few reads in blue (read strand is -) are mapped correctly. I'm sure that mapping of red reads is wrong because:

  1. it implies two mismatches in left part (TG in all reads instead of genomic AA)
  2. However these sequences match exactly the right exon
  3. the alignment contradict annotation.

That is bit confusing that reads with plus strand (red) were mapped assuming minus strand (according to the dinucleotides used). But strand confusion is the only explanation I can see for these results.

image

Relevant log output

got no logs, sorry

Application activity log entry

No response

Were you able to successfully run the latest version of the workflow with the demo data?

yes

Other demo data information

No response

Hi @iaaka

I'm sorry that you're having issues with the workflow. Indeed the minmap2 mapping stage is looking for splice sites on the wrong strand with multiome data, I think that is the cause of your problem.

I'm currently working on a fix for this, and will get it out ASAP and I'll let you know when then it is.

Hi,

It seems we might have encountered this bug for 3prime v3 data as well. When re-analyzing our data (previously processed with Sicelore) using wf-single-cell, we found that only 1.7% of spliced reads match the reference annotations (compared to 88.3% for Sicelore). Manual inspection of the aligned reads with matching identifiers between the two programs indicated that the spliced reads aligned with wf-single-cell had introns in different positions and with the GT-AG splice junctions on the opposite strands compared to our previous results.

Hi @mkabza

Yes this issue affects 3prime data also, thanks for the update. Full length 3prime and multiome reads were oriented in the reverse strand. This has now been fixed so that all full length reads should be in the mRNA sense orientation, and so minimpa2 should now look for the splice sites on the correct strand.

If yo'd like to try it out you can by adding -r prerelease to you command.

Thanks again

Hi @nrhorner ,

It seems that the issue still persists for our data - there are some differences in the output BAM file (e.g. the Attrition section of the report shows lower fraction of reads assigned to genes, but higher assigned to transcripts), but the percentage of spliced reads matching the reference annotations remains extremely low and manual inspection of the data indicates that the aligned reads have, for the most part, the same intron positions as in the unpatched version of the workflow (on the opposite strand compared to Sicelore results).

Thanks for getting back to me @mkabza
I will look into this. Would it be possible for you to share some affected data, wither fastq or a resulting BAM file?

Hi @nrhorner,

This is the data we've been analyzing: https://www.ncbi.nlm.nih.gov/sra?term=SRX22560608

Hi @nrhorner, we also run our mutiome data through updated pipeline and as far as I can see problem persists for us as well

@iaaka Are you using the v1.1.0 released yesterday?

We used 1.0.3 with -r prerelease specified, is it different from v1.1.0?

The release note (https://github.com/epi2me-labs/wf-single-cell/releases/tag/v1.1.0), contains entries regarding the orientation of reads.

I see, we just followed this suggestion

If yo'd like to try it out you can by adding -r prerelease to you command.

so, it v1.1.0 different from prerelease? will it make sence to try it?

Hi @nrhorner,
The v1.1.0 version of the workflow works perfectly for our data. Thanks!