Perl problems whilst predicting conserved protein domains (within Protocol 0)
Closed this issue · 3 comments
Hello, I am getting errors whilst running the generate_priority_list_from_RM2.sh in the 'P3 predict domains with Pfam' section - using the test data provided.
The following message is being returned and col4.txt contains only zeros
Find and extract open reading frames (ORFs)
./pfam.results does not exist. Running Pfam, this can take some time
Can't locate Moose.pm in @INC (you may need to install the Moose module) (@INC contains: /uoa/home/sharedscratch/.conda/envs/te_annot/share/pfam_scan-1.6-4 /opt/software/uoa/spack-sw/linux-rhel8-x86_64/gcc-12.1.0/perl-5.34.1-czc5op7kaioectl44zlbuxfqoigrcmw6/lib/perl5 /opt/software/uoa/spack-sw/linux-rhel8-x86_64/gcc-12.1.0/perl-5.34.1-czc5op7kaioectl44zlbuxfqoigrcmw6/lib/site_perl/5.34.1/x86_64-linux-thread-multi /opt/software/uoa/spack-sw/linux-rhel8-x86_64/gcc-12.1.0/perl-5.34.1-czc5op7kaioectl44zlbuxfqoigrcmw6/lib/site_perl/5.34.1 /opt/software/uoa/spack-sw/linux-rhel8-x86_64/gcc-12.1.0/perl-5.34.1-czc5op7kaioectl44zlbuxfqoigrcmw6/lib/5.34.1/x86_64-linux-thread-multi /opt/software/uoa/spack-sw/linux-rhel8-x86_64/gcc-12.1.0/perl-5.34.1-czc5op7kaioectl44zlbuxfqoigrcmw6/lib/5.34.1) at /uoa/home/sharedscratch/.conda/envs/te_annot/share/pfam_scan-1.6-4/Bio/Pfam/HMM/HMMResultsIO.pm line 50.
BEGIN failed--compilation aborted at /uoa/home/sharedscratch/.conda/envs/te_annot/share/pfam_scan-1.6-4/Bio/Pfam/HMM/HMMResultsIO.pm line 50.
Compilation failed in require at /uoa/home/sharedscratch/.conda/envs/te_annot/share/pfam_scan-1.6-4/Bio/Pfam/Scan/PfamScan.pm line 33.
BEGIN failed--compilation aborted at /uoa/home/sharedscratch/.conda/envs/te_annot/share/pfam_scan-1.6-4/Bio/Pfam/Scan/PfamScan.pm line 33.
Compilation failed in require at /uoa/home/sharedscratch/.conda/envs/te_annot/share/pfam_scan-1.6-4/pfam_scan.pl line 8.
BEGIN failed--compilation aborted at /uoa/home/sharedscratch/.conda/envs/te_annot/share/pfam_scan-1.6-4/pfam_scan.pl line 8.
To call the script I am loading:
module load cdhit blast-plus bedtools2 checkm coordinatecleaner devtools seurat muscle
module load r mafft hmmer perl-moose
and activating my conda environment that contains emboss, pfam_scan, gepard, ucsc-fasplit and perl-moose
Moose is definitely installed...
Any suggestions?
Assuming you have loaded the conda environment, check that pfam_scan.pl
runs ok with a sample protein. You can use an example TE found here.
Download the file and run:
pfam_scan.pl -fasta TE_protein_example.fa -dir <path_to_pfam_db>
you will need to replace the path to the pfam database, for which you need a local copy - the repository is here
That generated the same error as above.
I also tried installing pfam_scan in it's own conda environment but that generated the same error
Managed to solve it with a work around
I installed pfam_scan.py from https://github.com/aziele/pfam_scan
git clone https://github.com/aziele/pfam_scan
cd pfam_scan
./pfam_scan.py --help
and then edited the generate_priority_list_from_RM2.sh at the pfam_scan section:
# check if pfam has been run, otherwise run it
FILE=./pfam.results
if [ ! -f "$FILE" ]; then
echo "$FILE does not exist. Running Pfam, this can take some time"
/uoa/home/sharedscratch/apps/pfam_scan/pfam_scan.py -out pfam.results cdhit.orf $pfamdb
fi
echo "\n Finished with Pfam searches\n"
tail -n +2 pfam.results | awk 'BEGIN { FS = "," } ; {if ($6~/^PF/) {print $1}}' | sed 's/_/\//2;s/_/ /2' | awk '{print $1}' | sort > pf.domains.count
Thank you for your help! Onto the next section!..