annaprotasio/TE_ManAnnot

Perl problems whilst predicting conserved protein domains (within Protocol 0)

Closed this issue · 3 comments

Hello, I am getting errors whilst running the generate_priority_list_from_RM2.sh in the 'P3 predict domains with Pfam' section - using the test data provided.

The following message is being returned and col4.txt contains only zeros

Find and extract open reading frames (ORFs)
./pfam.results does not exist. Running Pfam, this can take some time
Can't locate Moose.pm in @INC (you may need to install the Moose module) (@INC contains: /uoa/home/sharedscratch/.conda/envs/te_annot/share/pfam_scan-1.6-4 /opt/software/uoa/spack-sw/linux-rhel8-x86_64/gcc-12.1.0/perl-5.34.1-czc5op7kaioectl44zlbuxfqoigrcmw6/lib/perl5 /opt/software/uoa/spack-sw/linux-rhel8-x86_64/gcc-12.1.0/perl-5.34.1-czc5op7kaioectl44zlbuxfqoigrcmw6/lib/site_perl/5.34.1/x86_64-linux-thread-multi /opt/software/uoa/spack-sw/linux-rhel8-x86_64/gcc-12.1.0/perl-5.34.1-czc5op7kaioectl44zlbuxfqoigrcmw6/lib/site_perl/5.34.1 /opt/software/uoa/spack-sw/linux-rhel8-x86_64/gcc-12.1.0/perl-5.34.1-czc5op7kaioectl44zlbuxfqoigrcmw6/lib/5.34.1/x86_64-linux-thread-multi /opt/software/uoa/spack-sw/linux-rhel8-x86_64/gcc-12.1.0/perl-5.34.1-czc5op7kaioectl44zlbuxfqoigrcmw6/lib/5.34.1) at /uoa/home/sharedscratch/.conda/envs/te_annot/share/pfam_scan-1.6-4/Bio/Pfam/HMM/HMMResultsIO.pm line 50.
BEGIN failed--compilation aborted at /uoa/home/sharedscratch/.conda/envs/te_annot/share/pfam_scan-1.6-4/Bio/Pfam/HMM/HMMResultsIO.pm line 50.
Compilation failed in require at /uoa/home/sharedscratch/.conda/envs/te_annot/share/pfam_scan-1.6-4/Bio/Pfam/Scan/PfamScan.pm line 33.
BEGIN failed--compilation aborted at /uoa/home/sharedscratch/.conda/envs/te_annot/share/pfam_scan-1.6-4/Bio/Pfam/Scan/PfamScan.pm line 33.
Compilation failed in require at /uoa/home/sharedscratch/.conda/envs/te_annot/share/pfam_scan-1.6-4/pfam_scan.pl line 8.
BEGIN failed--compilation aborted at /uoa/home/sharedscratch/.conda/envs/te_annot/share/pfam_scan-1.6-4/pfam_scan.pl line 8.

To call the script I am loading:

module load cdhit blast-plus bedtools2 checkm coordinatecleaner devtools seurat muscle
module load r mafft hmmer perl-moose

and activating my conda environment that contains emboss, pfam_scan, gepard, ucsc-fasplit and perl-moose
Moose is definitely installed...

Any suggestions?

Assuming you have loaded the conda environment, check that pfam_scan.pl runs ok with a sample protein. You can use an example TE found here.

Download the file and run:

pfam_scan.pl -fasta TE_protein_example.fa -dir <path_to_pfam_db>

you will need to replace the path to the pfam database, for which you need a local copy - the repository is here

That generated the same error as above.
I also tried installing pfam_scan in it's own conda environment but that generated the same error

Managed to solve it with a work around

I installed pfam_scan.py from https://github.com/aziele/pfam_scan

git clone https://github.com/aziele/pfam_scan
cd pfam_scan
./pfam_scan.py --help

and then edited the generate_priority_list_from_RM2.sh at the pfam_scan section:

# check if pfam has been run, otherwise run it
FILE=./pfam.results
if [ ! -f "$FILE" ]; then
    echo "$FILE does not exist. Running Pfam, this can take some time"
    /uoa/home/sharedscratch/apps/pfam_scan/pfam_scan.py -out pfam.results cdhit.orf $pfamdb
fi

echo "\n Finished with Pfam searches\n"

tail -n +2 pfam.results | awk 'BEGIN { FS = "," } ; {if ($6~/^PF/) {print $1}}' |  sed 's/_/\//2;s/_/ /2' | awk '{print $1}' | sort > pf.domains.count

Thank you for your help! Onto the next section!..