clemgoub/dnaPipeTE

Combining dnaPipeTE and DeepTE outputs

heidihyang opened this issue · 3 comments

Hi Clément,

I wanted to increase the number of classified TEs from the dnaPipeTE sequences so I used deepTE per one of your suggestions in one of the other discussions. I want to combine the data from both programs, and I was wondering if you have any recommendations for doing so. Do I just need to run the blast section of the pipeline for all of the deepTE sequences, since I would need the blast_reads.counts and other files to make the new graphs? Thanks in advance!

Best,
Heidi

Hello Heidi!

The simplest is to:
1- Create single library where for each contig you will have retained an annotation (either from RepeatMasker or DeepTE). You may have to decide on a rule to choose which annotation you retain for a given contig. Be sure to don't have duplicated contigs.
2- Make sure that the DeepTE annotations are in the RepeatMasker format. The headers in your new library should be of the form >TEname#Subclass/superfamily -- the list of recognized Subclass/superfamily labels are present in the file new_list_of_RM_superclass_colors_sortedOK in the dnaPipeTE folder, or you can find it here
3- Use your newly annotated library in a new run of dnaPipeTE with the option for a custom library: RM_lib custom_library.fasta

Let me know if you need more help!

Cheers,

Clément

Hi Clément,

Thank you for the info, sounds good! Would I combine it to the species lib in RepeatMasker? Also, DeepTE has some classifications that are MITEs of certain TEs (ex. a DNA hAT MITE). How should I classify this in the correct format? One last question (sorry, kind of new to all of this), is there a way to bypass the Trinity step since I already have contigs from the previous runs? I have limited computational allowance on my cluster and want to conserve it if possible. Thanks!

Best,
Heidi

Hi Heidi!

No problem at all I'm glad you reach out with questions!

  • You can combine with RepeatMasker's libraries indeed (It can be actually interesting to compare a run with and one without)
  • For the MITE, to be labeled correctly in dnaPipeTE, you need to have a header of the form TENAME#MITE/MITE. To keep the DNA/hat info, you may rename it TE_XX_DNA_hAT#MITE/MITE (as long as #MITE/MITE is present it will be recognized as MITE)
  • You should be able to bypass the Trinity step if it has run completely, by re-running the same command with the same output folder. Let me know if it doesn't do what you want!

Cheers,

Clément