JBerthelier/PiRATE

TE content (%)

Lukanyo12 opened this issue · 3 comments

Hello Jeremy,

As I have decided to use a single TEs library ( library excludes unknown repeats from PASTEC classification) for TEannot step. How can I determine the content of TEs in my genome e.g Class I, class II, and total TE content. For unknown repeats also, how can I determine their proportion in the genome?
From PASTEC output, I decided that I will separate the elements as Class I, Class II and unknown repeats. Howver, unknown repeats will be separated from the TE library that will be used for TE annot step.
Best regards
Lukanyo

Dear Lukanyo,

Someone asked a similar question regarding statistic #49.

Yes you library is totally fine to only use Class I and Class II, it gonna be more accurate.

If you want to estimate the pourcentage of unknown, you can re-perform the annotation with a library ClassI + ClassII + unknow, you could then compare the percentage.
The simple idea is=
(Percentage of class I + Class II) - (Percentage of percentage of ClassI + ClassII + unknown) = percentage unknow

Take in mind that unknow can mainly correponds to false positives, thus take it carefully.

Best,

Jeremy

You the best man, much appreciated for your time to respond to our many questions. One last question, would it be a problem then if I subject my SINE and MITE for PASTEC classification just to be more accurate. I concatenated the elements detected from step 1 prior clustering with CD-HIT-est (excluding elements detected by SINE-finder and MITE hunter). Hereafter, I decided to concatenate again the outputs from CD-HIT-est with those elements now detected by SINE-finder and MITE hunter before classification with PASTEC, is that a problem? because I wanted to be more accurate.

Dear Drsurch13,

Yes sure this is the best way to do. This gonna help you to highlight potential false positives identified by sine-finder / mite-hunter.

Best

Jeremy