Read to Isoforms Mapping
Opened this issue · 1 comments
Hi,
I wonder where can I find the read-to-isoform mapping information from the tool?
Thank you!
Hi @shizhuom, Isosceles does not perform a one-to-one mapping of reads-to-isoforms. Instead, it uses the expectation-maximization (EM) algorithm to assign reads to isoforms, based on the maximum likelihood estimates of the transcripts' relative expression levels. This means that ambiguous long reads can contribute to the quantification of multiple full-length transcripts, much like how RSEM or Kallisto handle ambiguous short-read data. For example, we find that ambiguous truncated reads are pervasive in nanopore single-cell data, eg. from droplet-based protocols that anchor from the 5' or 3' ends.
We believe this approach is why Isosceles—and Bambu, which also uses EM—outperform other methods in the quantification benchmarks in our recent paper. Before applying the EM algorithm, Isosceles groups reads into Transcript Compatibility Counts (TCCs), which are sets of reads that are compatible with the same isoforms. Isosceles does provide the counts and compatibility matrix mappings for these TCCs, which are available as a summarized experiment in R using the bam_to_tcc
function.