Best Step for Running SQANTI3 Filtering on Isoforms
Opened this issue · 1 comments
Hello, and thank you for developing this fantastic tool!
I have generated a GTF file using export_gtf(se_transcript)
after running bam_to_tcc()
in "de_novo_loose" mode, followed by tcc_to_transcript()
. I then applied SQANTI3 for filtering the isoforms in the GTF file. However, I wonder if it might be more effective to apply SQANTI3 filtering at an earlier step, prior to the EM quantification, to ensure that artifact isoforms are not included in the quantification process.
Would you be able to advise on the most appropriate step to run SQANTI3 filtering in this workflow?
Kind regards,
Sofia
Hi @skudashev, Thanks for the kind words. To answer your question, Isosceles is designed to be fairly stringent in the de novo transcripts it proposes (eg. they must have reproducibly detectable splice-sites that connect to known splice-sites), and so the output of a single run is meant to be high-quality. If you are observing many artifacts from de novo transcripts, or low read mapping, you should ensure you are using the --junc-bed
parameter with minimap2 (as is outlined in our best practices section in the main readme and documentation)-- which is necessary as Isosceles doesn't provide post-hoc correction of spliced alignment blocks like other methods.
That said, you can certainly run Isosceles as you suggest, first to generate a GTF of de novo transcripts, and second to quantify a filtered set of those transcripts if you so desire. In our paper, we also achieved top F1 scores by providing a de novo GTF file from IsoQuant or StringTie2 as input to Isosceles.