oushujun/EDTA

can't find LTR and NO SINE on Split Genome

Opened this issue · 2 comments

First of all, thank you for providing such an excellent tool for TE annotation. I’m currently using EDTA v2.2.1 to annotate transposable elements for a large genome, GCA_014155895.2 (~16G). Due to its size, I’ve split the genome by chromosomes into 9 parts, each processed separately with EDTA. Below is the command I’m using for each split:

perl ../EDTA/EDTA.pl --genome new.part_002.fasta --species others --step all --overwrite 0 --sensitive 0 --anno 0 --evaluate 0 --u 1.3e-8 --threads 32 --force 1

I encountered the following issues:

  1. Issue with LTR detection: When running EDTA on chromosome 2 (new.part_002.fasta, 2.0G), I received an error, and no LTR elements were detected. Could you please advise on why this may be happening for this specific chromosome?
The start time is: 2024-10-09 21:29:56 
My job ID is: 15283037 
The total cores is: 64 
The hosts is: 
i05r3n18:64

#########################################################
##### Extensive de-novo TE Annotator (EDTA) v2.2.1  #####
##### Shujun Ou (shujun.ou.1@gmail.com)             #####
#########################################################

Parameters: --genome new.part_002.fasta --species others --step all --overwrite 0 --sensitive 0 --anno 0 --evaluate 0 --u 1.3e-8 --threads 32 --force 1


Wed Oct  9 21:30:14 CST 2024	Dependency checking:
				All passed!

Wed Oct  9 21:31:50 CST 2024	The longest sequence ID in the genome contains 87 characters, which is longer than the limit (13)
				Trying to reformat seq IDs...
				Attempt 1...
Wed Oct  9 21:32:24 CST 2024	Seq ID conversion successful!

Wed Oct  9 21:32:24 CST 2024	Obtain raw TE libraries using various structure-based programs: 

Wed Oct  9 21:32:24 CST 2024	EDTA_raw: Check dependencies, prepare working directories.

Wed Oct  9 21:32:49 CST 2024	Start to find LTR candidates.

Wed Oct  9 21:32:49 CST 2024	Identify LTR retrotransposon candidates from scratch.

Out of memory!
Out of memory!
cat: new.part_002.fasta.mod.harvest.combine.scn: No such file or directory
cat: new.part_002.fasta.mod.finder.combine.scn: No such file or directory
grep: new.part_002.fasta.mod.retriever.scn: No such file or directory
Argument "" isn't numeric in numeric gt (>) at /work/home/acbirxa1yd/miniconda3/envs/EDTA2/share/LTR_retriever/LTR_retriever line 380.

ERROR: No candidate is found in the file(s) you specified.

awk: fatal: cannot open file `new.part_002.fasta.mod.pass.list' for reading: No such file or directory
Warning: LOC list - is empty.

	perl rename_LTR_skim.pl target_sequence.fa LTR_retriever.defalse


Error: Error while loading sequence
Filter sequence based on TEsorter classifications. Unclassified sequences will also be output to the clean file.
	Usage: perl cleanup_misclas.pl sequence.fa.rexdb.cls.tsv
	Author: Shujun Ou (shujun.ou.1@gmail.com) 10/11/2019
	
mv: cannot stat 'new.part_002.fasta.mod.LTR.intact.fa.ori.dusted.cln.cln': No such file or directory
mv: cannot stat 'new.part_002.fasta.mod.LTR.intact.fa.ori.dusted.cln.cln.list': No such file or directory
cp: cannot stat 'new.part_002.fasta.mod.LTR.intact.raw.fa.anno.list': No such file or directory
ERROR: No such file or directory at /work/home/acbirxa1yd/renhongbin/EDTA/util/output_by_list.pl line 39.

	perl filter_gff3.pl file.gff3 file.list > new.gff3

Wed Oct  9 21:35:32 CST 2024	Warning: The LTR result file has 0 bp!

Wed Oct  9 21:35:32 CST 2024	Start to find SINE candidates.

Thu Oct 10 03:26:20 CST 2024	Finish finding SINE candidates.

Thu Oct 10 03:26:20 CST 2024	Start to find LINE candidates.

Thu Oct 10 03:26:20 CST 2024	Existing result file new.part_002.fasta.mod-families.fa found!
				Will keep this file without rerunning this module.
				Please specify --overwrite 1 if you want to rerun this module.

Thu Oct 10 03:26:30 CST 2024	Finish finding LINE candidates.

Thu Oct 10 03:26:30 CST 2024	Start to find TIR candidates.

Thu Oct 10 03:26:30 CST 2024	Identify TIR candidates from scratch.

Species: others
Thu Oct 10 16:34:43 CST 2024	Finish finding TIR candidates.

Thu Oct 10 16:34:43 CST 2024	Start to find Helitron candidates.

Thu Oct 10 16:34:43 CST 2024	Existing result file new.part_002.fasta.mod.Helitron.intact.raw.fa found!
				Will keep this file without rerunning this module.
				Please specify --overwrite 1 if you want to rerun this module.

Thu Oct 10 16:34:43 CST 2024	Finish finding Helitron candidates.

Thu Oct 10 16:34:43 CST 2024	Execution of EDTA_raw.pl is finished!

Thu Oct 10 16:34:43 CST 2024	Obtain raw TE libraries finished.
				All intact TEs found by EDTA: 
					new.part_002.fasta.mod.EDTA.intact.raw.fa 
					new.part_002.fasta.mod.EDTA.intact.raw.gff3

Thu Oct 10 16:34:43 CST 2024	Perform EDTA advance filtering for raw TE candidates and generate the stage 1 library: 

Thu Oct 10 16:35:50 CST 2024	EDTA advance filtering finished.

Thu Oct 10 16:35:50 CST 2024	Perform EDTA final steps to generate a non-redundant comprehensive TE library.

				Skipping the RepeatModeler results (--sensitive 0).
				Run EDTA.pl --step final --sensitive 1 if you want to add RepeatModeler results.

				Skipping the CDS cleaning step (--cds [File]) since no CDS file is provided or it's empty.

Thu Oct 10 16:37:06 CST 2024	EDTA final stage finished! You may check out:
				The final EDTA TE library: new.part_002.fasta.mod.EDTA.TElib.fa
The end time is: 2024-10-10 16:37:06

Warning: No sequences were masked
  1. Issue with SINE detection: For other chromosome parts, while LTR elements were detected, no SINE elements were found during the annotation process. Is there something that could be affecting SINE detection across these chromosomes?
The start time is: 2024-09-25 16:01:12 
My job ID is: 14944128 
The total cores is: 32 
The hosts is: 
g06r4n15:32

#########################################################
##### Extensive de-novo TE Annotator (EDTA) v2.2.1  #####
##### Shujun Ou (shujun.ou.1@gmail.com)             #####
#########################################################

Parameters: --genome new.part_006.fasta --species others --step all --overwrite 0 --sensitive 0 --anno 0 --evaluate 0 --u 1.3e-8 --threads 32 --force 1

Wed Sep 25 16:01:14 CST 2024	Dependency checking:
				All passed!

Wed Sep 25 16:02:34 CST 2024	The longest sequence ID in the genome contains 61 characters, which is longer than the limit (13)
				Trying to reformat seq IDs...
				Attempt 1...
Wed Sep 25 16:03:01 CST 2024	Seq ID conversion successful!

Wed Sep 25 16:03:01 CST 2024	Obtain raw TE libraries using various structure-based programs: 

Wed Sep 25 16:03:01 CST 2024	EDTA_raw: Check dependencies, prepare working directories.

Wed Sep 25 16:03:22 CST 2024	Start to find LTR candidates.

Wed Sep 25 16:03:22 CST 2024	Identify LTR retrotransposon candidates from scratch.

Thu Sep 26 09:26:36 CST 2024	Finish finding LTR candidates.

Thu Sep 26 09:26:36 CST 2024	Start to find SINE candidates.

cp: cannot stat 'new.part_006.fasta.mod.SINE.raw.fa': No such file or directory
Error: SINE results not found!

cat: new.part_006.fasta.mod.TIR.intact.raw.bed: No such file or directory
cat: new.part_006.fasta.mod.Helitron.intact.raw.bed: No such file or directory
cp: cannot stat '../new.part_006.fasta.mod.EDTA.raw/new.part_006.fasta.mod.RM2.fa': No such file or directory

Thu Sep 26 09:26:37 CST 2024	Obtain raw TE libraries finished.
				All intact TEs found by EDTA: 
					new.part_006.fasta.mod.EDTA.intact.raw.fa 
					new.part_006.fasta.mod.EDTA.intact.raw.gff3

Thu Sep 26 09:26:37 CST 2024	Perform EDTA advance filtering for raw TE candidates and generate the stage 1 library: 

Thu Sep 26 09:34:02 CST 2024	EDTA advance filtering finished.

Thu Sep 26 09:34:02 CST 2024	Perform EDTA final steps to generate a non-redundant comprehensive TE library.

				Skipping the RepeatModeler results (--sensitive 0).
				Run EDTA.pl --step final --sensitive 1 if you want to add RepeatModeler results.

				Skipping the CDS cleaning step (--cds [File]) since no CDS file is provided or it's empty.

Thu Sep 26 10:28:10 CST 2024	EDTA final stage finished! You may check out:
				The final EDTA TE library: new.part_006.fasta.mod.EDTA.TElib.fa
The end time is: 2024-09-26 10:28:10

If you need further information or logs, I’d be happy to provide them. I appreciate your time and help with these issues.

Thank you again for your continued support and for developing such a valuable tool!

Best regards,
rr

Any lucks?