comprna/RATTLE

How to proceed with the output after polishing the data?

yeroslaviz opened this issue · 6 comments

I'm not sure I really understand the goal of this tool.

From what i ran up to now, i have created a "clean" fastq file of the consensus transcripts from multiple clusters. I don't quite get how i can find out how many clusters I have. The different files in between have various row numbers which don't really fit together.
the last file after polishing has 700 rows. Does this means that I have 175 clusters?

And now

What are this clusters? How can I continue with this?
Should this fastq be analyzed as a normal fastq file like it came out of the sequencing machine?

How can I match the clusters to genes or transcripts?

thanks

Assa

Hi Assa,

The outputs of polish step are transcriptome. The cluster IDs from the cluster and error correction steps are the same. And the polish step generates one transcriptome for each cluster.

the last file after polishing has 700 rows. Does this means that I have 175 clusters?
Yes. The clusters' IDs and numbers are provided in each header.

Thanks,
Eileen

Thanks for the fast response.

What can I now do with these clusters?

Does it help re-mapping the (E.g. minimap2) to a reference?
How can I gain further information from this output?

As this was only a test run with only two samples, we don't really have different replica or conditions yet, but the organism is C. elegans.

If I understand the tool correctly, each cluster should give back one transcript, or at least a group of transcripts unique to this cluster (Can one transcript be found in two or more clusters?).

This seems to me to be a very complicated method to do a differential expression analysis.

Creating the transcriptome for two different samples will definitely won't create the same consensus reads for each cluster (or gene bag as you call it). So how can you compare them without re-mapping it.

The logical next-step to me is to map the consensus transcriptome to a reference genome, but it wouldn't have the depth of the original fastq file, as a lot of the reads are gone and there are no real qualities in the fastq anymore (they are all "K").

Not really sure what you mean by mapping to annotation. Blasting each transcript? that's also a lot of effort.

To me, the tool is missing the possibility to map each transcript to a gene, a transcript or something similar to quantify the results.
Sorry for for being negative, but I would like to understand how to gain as much knowledge from this tool as possible.

thank
Assa

Thanks for the very elaborate answer.

I must admit that I started working with the tool mainly because in the description is also says "quantification". I know I don't need a reference-free transcriptome, but I was hoping to be able to do a quantification based on a reference-free transcriptome.
If I understand you correctly, a "simple" quantification would be easier and faster using a reference-dependent methods such as nanoCounts or bambu.

But the suggestion with BUSCO seems interesting and I'll give it a go to check the results.

thanks again.

Assa