ksahlin/isONclust

Stranded vs unstranded sequencing

wenzelm opened this issue · 2 comments

Dear Kristoffer,

I'm very impressed with isONclust's performance - thank you for designing this much-needed tool!

Out of curiosity, I'm evaluating its utility for clustering reads from ONT genomic amplicon data instead of transcriptome data. We've sequenced a 9kb amplicon from a viral genome and suspect that we might have mixed infection with potentially large structural variations among strains. I realise isONclust is not designed for this type of data, but I was curious to see how it performs anyway.

My amplicon dataset is clustered into two clusters with c. 50% abundance each. After some initial excitement I noticed that the clusters correspond to sequencing direction. So, it looks like isONclust expects stranded data and won't cluster reverse reads with forward reads. I guess this behaviour makes sense to identify antisense transcripts, but it's not appropriate for unstranded data.

I realise that PacBio IsoSeq and ONT direct RNA data are stranded, but ONT cDNA libraries are typically unstranded (?). Could isONclust be modified to accept unstranded data? Or can unstranded data be reoriented before running isONclust?

Thanks and best wishes,
Marius

Hi Marius,

Thank you!

I believe unstranded ONT cDNA data can be reoriented using pychopper before running isONclust (make sure it is the latest version of pychopper though).

Pinging @bsipos who is the developer of pychopper and knows a lot in general about your inquiry.

Best,
Kristoffer

Hi Kristoffer,

Thank you, I'll have a look at pychopper and other solutions to re-orient unstranded data.

Best wishes,
Marius