Problem at RepeatMasker stage
PatrickCKennedy opened this issue · 2 comments
Dear Clément,
Thank you for creating dnaPipeTE.
I am currently running the programme on my local computer, using the following commands:
sudo docker run --platform linux/amd64 -it -v ~/Project:/mnt clemgoub/dnapipete:latest
python3 dnaPipeTE.py \
-input /mnt/data/SAMPLE_R1.fastq.gz \
-output /mnt/Patrick_28Nov2023a \
-sample_size 1000 \
-sample_number 2 \
-RM_t 0.25 \
-cpu 8
(I have set the sample_size to be extremely low here, just as a practice run, as I was running into issues and I want to iron them out before running the full sample size.)
The Trinity steps seem to run fine, but then it hits a snag when it comes to the RepeatMasker stages:
#######################################
### REPEATMASKER to anotate contigs ###
#######################################
RepeatMasker version 4.1.3
Search Engine: NCBI/RMBLAST [ 2.11.0+ ]
Using Master RepeatMasker Database: /opt/RepeatMasker/Libraries/RepeatMaskerLib.h5
Title : Dfam
Version : 3.6
Date : 2022-04-12
Families : 19,025
Species/Taxa Search:
Homo sapiens [NCBI Taxonomy ID: 9606]
Lineage: root;cellular organisms;Eukaryota;Opisthokonta;Metazoa;
Eumetazoa;Bilateria;Deuterostomia;Chordata;
Craniata <chordates>;Vertebrata <vertebrates>;
Gnathostomata <vertebrates>;Teleostomi;Euteleostomi;
Sarcopterygii;Dipnotetrapodomorpha;Tetrapoda;Amniota;
Mammalia;Theria <mammals>;Eutheria;Boreoeutheria;
Euarchontoglires;Primates;Haplorrhini;Simiiformes
1339 families in ancestor taxa; 49 lineage-specific families
Building general libraries in: /opt/RepeatMasker/Libraries/CONS-Dfam_3.6/general
RepeatMasker::createLib(): Error invoking /opt/rmblast/bin/makeblastdb on file /opt/RepeatMasker/Libraries/CONS-Dfam_3.6/general.working/is.lib.
Traceback (most recent call last):
File "dnaPipeTE.py", line 698, in <module>
RepeatMasker(config['DEFAULT']['RepeatMasker'], args.RepeatMasker_library, args.RM_species, args.cpu, args.output_folder, args.RM_threshold)
File "dnaPipeTE.py", line 381, in __init__
self.repeatmasker_run()
File "dnaPipeTE.py", line 400, in repeatmasker_run
with open(self.output_folder+"/Trinity.fasta.out", 'r') as trinity_handle:
FileNotFoundError: [Errno 2] No such file or directory: '/mnt/Patrick_28Nov2023a/Trinity.fasta.out'
In the output folder, there is no file called 'Trinity.fasta.out', although there is one called 'Trinity.fasta'.
There seem to be two issues there: (1) an issue with makeblastdb and (2) the fact that there is no file called 'Trinity.fasta.out'.
I am not aware of a Hymenoptera-specific library (which would be relevant for my species), so I have kept it as the default library 'RepeatMaskerLib.h5'. I hope that that is acceptable.
Many thanks if you can help solve this issue!
As a quick note, I've also tried downloading RepeatMasker.lib
from an online source and pointing to it using -RM_lib
:
python3 dnaPipeTE.py -input /mnt/data/SAMPLE_R1.fastq.gz -output /mnt/Patrick_27Nov2023 -sample_size 1000 -sample_number 2 -RM_t 0.25 -cpu 8 -RM_lib /mnt/data/RepeatMasker.lib
...but this leads to the same error:
RepeatMasker version 4.1.3
Search Engine: NCBI/RMBLAST [ 2.11.0+ ]
Using Custom Repeat Library: /mnt/data/RepeatMasker.lib
Building general libraries in: /opt/RepeatMasker/Libraries//general
RepeatMasker::createLib(): Error invoking /opt/rmblast/bin/makeblastdb on file /opt/RepeatMasker/Libraries//general.working/is.lib.
Traceback (most recent call last):
File "dnaPipeTE.py", line 698, in <module>
RepeatMasker(config['DEFAULT']['RepeatMasker'], args.RepeatMasker_library, args.RM_species, args.cpu, args.output_folder, args.RM_threshold)
File "dnaPipeTE.py", line 381, in __init__
self.repeatmasker_run()
File "dnaPipeTE.py", line 400, in repeatmasker_run
with open(self.output_folder+"/Trinity.fasta.out", 'r') as trinity_handle:
FileNotFoundError: [Errno 2] No such file or directory: '/mnt/Patrick_27Nov2023/Trinity.fasta.out'
Problem solved!
Turns out I was encountering the same issue as this thread:
Dfam-consortium/RepeatMasker#148
The problem was simply insufficient memory allocation.
Adding the following line solved the issue:
export BLASTDB_LMDB_MAP_SIZE=100000000
All the best,
Patrick