tk2/RetroSeq

Uninitalized value error post PE parsing

biobenkj opened this issue · 9 comments

When I run RetroSeq in the align mode, it gets to PE alignment parsing and then breaks. Not sure what the error means (uninitialized value before assignment?).

Input: perl bin/retroseq.pl -discover -bam ../bwa/Mtbcosmid.sorted.bam -eref ../bwa/retroseqTNlib.tab -output ../bwa/Mtbcosmidtest.candidates.tab -align

Output:
RetroSeq: A tool for discovery and genotyping of transposable elements from short read alignments

Version: 1.41
Author: Thomas Keane (thomas.keane@sanger.ac.uk)

Reading -eref file: ../bwa/retroseqTNlib.tab

Min anchor quality: 20
Min percent identity: 80
Min length for hit: 36

Opening BAM (../bwa/Mtbcosmid.sorted.bam) and getting initial set of candidate mates....
Reading chromosome: pRD12F9
1075 candidate reads remain to be found after first pass....
Reading chromosome: pRD12F9
Parsing PE alignments....
Use of uninitialized value $lastLine in string ne at bin/retroseq.pl line 587.
Alignment did not complete correctly

Any insight you could provide would be great!

tk2 commented

I suspect that your exonerate alignment did not complete. This line is where it checks the exonerate output for "-- completed exonerate analysis" from exonerate to say it completed fully. Did it maybe run over a time limit or memory on your machine?

Hi. I have the same problem. If someone could find a solution, it would be great to share it please.
I run the tool only on one chromosome to test it. So, there is no problem of memory or time limit.
Thanks

So the way around this @tk2 and @kenza12 is to download the latest version of exonerate (https://www.ebi.ac.uk/about/vertebrate-genomics/software/exonerate) [2.4.0], recompile and execute retroseq. Must be some issue with 2.2.0...

I am using exonerate 2.4, but still ran into the same problem. I think there is a memory issue associated with the usage of --bestn. I have more than 100Gb memory..Not sure how to solve this.

I tried two different versions of exonerate (2.4 and 2.22) with the samples used in the tutorial. I get the same error with both versions. Below is the tail end of my output.

"
Reading chromosome: GL000225.1
Reading chromosome: GL000192.1
Reading chromosome: NC_007605
Reading chromosome: hs37d5
Using reference TE locations to assign discordant mates...
Screening for hits to: Alu
Screening for hits to: L1HS
Use of uninitialized value $lastLine in string ne at retroseq.pl line 509.
Alignment did not complete correctly
Parsing PE alignments....
"

I used the tutorial commands with updated paths to my files.

Is this issue going to be fixed?

Hi @tk2
I'm getting the same error with both exonerate 2.2.0 and 2.4.0:

...
649922 candidate reads remain to be found after first pass....
Reading chromosome: chr1
...
Parsing PE alignments....
Use of uninitialized value $lastLine in string ne at /home/newmanlab/dwesche/programs/RetroSeq/bin/retroseq.pl line 509.
Alignment did not complete correctly

Here's the run command:
retroseq.pl -discover -align -bam /my/bam/file.bam -eref /my/eref/file.txt -output ./outfile.txt

Are there any new insights on this?
Thanks!

I also have this problem and my exonerate is 2.4.0. Anyone has a solution?

tk2 commented

Hi - I just re-ran the NA12878 data from the wiki page and it completes just fine. The underlying cause is usually that exonerate ran out of memory, if you were running on a compute farm can you check if the process hit the memory limit?

I'm happy to have a look at specific examples if you can provide me with test data.

Hi, I had the same problem and this is certainly not a memory issue. I ran RetroSeq on 55 samples and only one sample (referred to as 'bad' sample) produced this error repeatedly. Each of my samples had 16 chromosomes. When I split the bam file of the 'bad' sample into 16 bam files (one file per chromosome) and then ran the analysis, RetroSeq worked.