Combining dnaPipeTE and DeepTE outputs
heidihyang opened this issue · 3 comments
Hi Clément,
I wanted to increase the number of classified TEs from the dnaPipeTE sequences so I used deepTE per one of your suggestions in one of the other discussions. I want to combine the data from both programs, and I was wondering if you have any recommendations for doing so. Do I just need to run the blast section of the pipeline for all of the deepTE sequences, since I would need the blast_reads.counts and other files to make the new graphs? Thanks in advance!
Best,
Heidi
Hello Heidi!
The simplest is to:
1- Create single library where for each contig you will have retained an annotation (either from RepeatMasker or DeepTE). You may have to decide on a rule to choose which annotation you retain for a given contig. Be sure to don't have duplicated contigs.
2- Make sure that the DeepTE annotations are in the RepeatMasker format. The headers in your new library should be of the form >TEname#Subclass/superfamily
-- the list of recognized Subclass/superfamily
labels are present in the file new_list_of_RM_superclass_colors_sortedOK
in the dnaPipeTE folder, or you can find it here
3- Use your newly annotated library in a new run of dnaPipeTE with the option for a custom library: RM_lib custom_library.fasta
Let me know if you need more help!
Cheers,
Clément
Hi Clément,
Thank you for the info, sounds good! Would I combine it to the species lib in RepeatMasker? Also, DeepTE has some classifications that are MITEs of certain TEs (ex. a DNA hAT MITE). How should I classify this in the correct format? One last question (sorry, kind of new to all of this), is there a way to bypass the Trinity step since I already have contigs from the previous runs? I have limited computational allowance on my cluster and want to conserve it if possible. Thanks!
Best,
Heidi
Hi Heidi!
No problem at all I'm glad you reach out with questions!
- You can combine with RepeatMasker's libraries indeed (It can be actually interesting to compare a run with and one without)
- For the MITE, to be labeled correctly in dnaPipeTE, you need to have a header of the form
TENAME#MITE/MITE
. To keep the DNA/hat info, you may rename itTE_XX_DNA_hAT#MITE/MITE
(as long as #MITE/MITE is present it will be recognized as MITE) - You should be able to bypass the Trinity step if it has run completely, by re-running the same command with the same output folder. Let me know if it doesn't do what you want!
Cheers,
Clément