jtamames/SqueezeMeta

Stopping in STEP5

Closed this issue · 5 comments

Hi jtamames

Could you help with this issue, I don't understand what is happening, the database is well installed, I have 128 cores with 256 GB RAM in a cluster to run this pipeline (SqueezeMeta v1.6.3, September 2023).

[52 seconds]: STEP5 -> HMMER/PFAM: 05.run_hmmer.pl
Running HMMER3 (Eddy 2009, Genome Inform 23, 205-11) for Pfam
Error running command: /home/guillermo.reyes/miniconda3/envs/SqueezeMeta/SqueezeMeta/bin/hmmer/hmmsearch --domtblout /home/guillermo.reyes/CEDIA_Microbiomas/Shotgun_Metagenomics_1er_muestreo/Analisis/01.RawData/Mar_ingreso_M1/intermediate/05.Mar_ingreso_M1.pfam.hmm -E 1e-10 --cpu 128 /home/guillermo.reyes/SqueezeDataBase/db/Pfam-A.hmm /home/guillermo.reyes/CEDIA_Microbiomas/Shotgun_Metagenomics_1er_muestreo/Analisis/01.RawData/Mar_ingreso_M1/results/03.Mar_ingreso_M1.faa > /dev/null 2>&1 at /home/guillermo.reyes/miniconda3/envs/SqueezeMeta/SqueezeMeta/scripts/05.run_hmmer.pl line 31.
Stopping in STEP5 -> 05.run_hmmer.pl. Program finished abnormally


I checked the installed database with test_install.pl

Scalar value @Args[-1] better written as $args[-1] at /home/guillermo.reyes/miniconda3/envs/SqueezeMeta/bin/test_install.pl line 208.

Checking the OS
linux OK

Checking that tree is installed
tree --help OK

Checking that ruby is installed
ruby -h OK

Checking that java is installed
java -h OK

Checking that all the required perl libraries are available in this environment
perl -e 'use Term::ANSIColor' OK
perl -e 'use DBI' OK
perl -e 'use DBD::SQLite::Constants' OK
perl -e 'use Time::Seconds' OK
perl -e 'use Tie::IxHash' OK
perl -e 'use Linux::MemInfo' OK
perl -e 'use Getopt::Long' OK
perl -e 'use File::Basename' OK
perl -e 'use DBD::SQLite' OK
perl -e 'use Data::Dumper' OK
perl -e 'use Cwd' OK
perl -e 'use XML::LibXML' OK
perl -e 'use XML::Parser' OK
perl -e 'use Term::ANSIColor' OK

Checking that all the required python libraries are available in this environment
python3 -h OK
python3 -c 'import numpy' OK
python3 -c 'import scipy' OK
python3 -c 'import matplotlib' OK
python3 -c 'import dendropy' OK
python3 -c 'import pysam' OK
python3 -c 'import Bio.Seq' OK
python3 -c 'import pandas' OK
python3 -c 'import sklearn' OK
python3 -c 'import nose' OK
python3 -c 'import cython' OK
python3 -c 'import future' OK

Checking that all the required R libraries are available in this environment
R -h OK
R -e 'library(doMC)' OK
R -e 'library(ggplot2)' OK
R -e 'library(data.table)' OK
R -e 'library(reshape2)' OK
R -e 'library(pathview)' OK
R -e 'library(DASTool)' OK
R -e 'library(SQMtools)' OK

Checking binaries
spades.py OK
metabat2 OK
jgi_summarize_bam_contig_depths OK
samtools OK
bwa OK
minimap2 OK
diamond OK
hmmsearch OK
cd-hit-est OK
kmer-db OK
aragorn OK
mothur OK

Checking that SqueezeMeta is properly configured... checking database in /home/guillermo.reyes/SqueezeDataBase/db
nr.db OK
CheckM manifest OK
LCA_tax DB OK

All checks successful

Your installation seems to be fine and we don't usually have problems with step 5.

You mentioned that you are running this in a cluster. Any chance your process just ran out of time and was killed by the workload manager? (although the output you pasted says that step 05 started at 52 seconds, maybe this was already a restart?)

The hmmsearch binary seems to be loading well according to test_install.pl but I would still like to test it with your data. What is the output of running the following command?

/home/guillermo.reyes/miniconda3/envs/SqueezeMeta/SqueezeMeta/bin/hmmer/hmmsearch --domtblout /home/guillermo.reyes/CEDIA_Microbiomas/Shotgun_Metagenomics_1er_muestreo/Analisis/01.RawData/Mar_ingreso_M1/intermediate/05.Mar_ingreso_M1.pfam.hmm -E 1e-10 --cpu 128 /home/guillermo.reyes/SqueezeDataBase/db/Pfam-A.hmm /home/guillermo.reyes/CEDIA_Microbiomas/Shotgun_Metagenomics_1er_muestreo/Analisis/01.RawData/Mar_ingreso_M1/results/03.Mar_ingreso_M1.faa ?

Hi fpusan

Here is the result

(SqueezeMeta) guillermo.reyes@dgx-node-0-0:~/CEDIA_Microbiomas/Shotgun_Metagenomics_1er_muestreo/Analisis/01.RawData$ /home/guillermo.reyes/miniconda3/envs/SqueezeMeta/SqueezeMeta/bin/hmmer/hmmsearch --domtblout /home/guillermo.reyes/CEDIA_Microbiomas/Shotgun_Metagenomics_1er_muestreo/Analisis/01.RawData/Mar_ingreso_M1/intermediate/05.Mar_ingreso_M1.pfam.hmm -E 1e-10 --cpu 128 /home/guillermo.reyes/SqueezeDataBase/db/Pfam-A.hmm /home/guillermo.reyes/CEDIA_Microbiomas/Shotgun_Metagenomics_1er_muestreo/Analisis/01.RawData/Mar_ingreso_M1/results/03.Mar_ingreso_M1.faa

hmmsearch :: search profile(s) against a sequence database

HMMER 3.1b2 (February 2015); http://hmmer.org/

Copyright (C) 2015 Howard Hughes Medical Institute.

Freely distributed under the GNU General Public License (GPLv3).

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

query HMM file: /home/guillermo.reyes/SqueezeDataBase/db/Pfam-A.hmm

target sequence database: /home/guillermo.reyes/CEDIA_Microbiomas/Shotgun_Metagenomics_1er_muestreo/Analisis/01.RawData/Mar_ingreso_M1/results/03.Mar_ingreso_M1.faa

per-dom hits tabular output: /home/guillermo.reyes/CEDIA_Microbiomas/Shotgun_Metagenomics_1er_muestreo/Analisis/01.RawData/Mar_ingreso_M1/intermediate/05.Mar_ingreso_M1.pfam.hmm

sequence reporting threshold: E-value <= 1e-10

number of worker threads: 128

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Query: 1-cysPrx_C [M=40]
Accession: PF10417.12
Description: C-terminal domain of 1-Cys peroxiredoxin
Parse failed (sequence file /home/guillermo.reyes/CEDIA_Microbiomas/Shotgun_Metagenomics_1er_muestreo/Analisis/01.RawData/Mar_ingreso_M1/results/03.Mar_ingreso_M1.faa):
Premature EOF in parsing FASTA name/description line

It would seem that the aminoacids file is truncated. I don't think I've seen this happen before. Can you share the /home/guillermo.reyes/CEDIA_Microbiomas/Shotgun_Metagenomics_1er_muestreo/Analisis/01.RawData/Mar_ingreso_M1/results/03.Mar_ingreso_M1.faa with us?
Also can you share the /home/guillermo.reyes/CEDIA_Microbiomas/Shotgun_Metagenomics_1er_muestreo/Analisis/01.RawData/Mar_ingreso_M1/syslog file here?

Hello
It looks like there is something wrong with the predicted proteins' file 03.Mar_ingreso_M1.faa. Could you tell me the result of:
tail -n 20 /home/guillermo.reyes/CEDIA_Microbiomas/Shotgun_Metagenomics_1er_muestreo/Analisis/01.RawData/Mar_ingreso_M1/results/03.Mar_ingreso_M1.faa

Closing due to lack of activity, feel free to reopen