leylabmpi/Struo2

Humann3 specifies wrong index for Bowtie2

BostjanMurovec opened this issue · 12 comments

Good Day!

When I run Humann3 according to instructions for the use of demo Struo2 database (http://ftp.tue.mpg.de/ebio/projects/struo2/GTDB_release95/humann3/ReadMe.md), of course suitably adopted to the actual disk contents, Humann3 crashes with an error that Bowtie2 crashed internally during the Humann3's run.

Inspection of Humann3's stderr reveals that Humann3 specifies wrong index file to Bowtie2:

=====================================================================
Error message returned from bowtie2 :
Could not open index file /home/bostjan/struo2_databases/GTDB_release95/humann3/uniref90/all_genes_annot.rev.rev.rev.1.bt2l
Could not open index file /home/bostjan/struo2_databases/GTDB_release95/humann3/uniref90/all_genes_annot.rev.rev.rev.2.bt2l
Segmentation fault (core dumped)
(ERR): bowtie2-align exited with value 139

The trouble stems from the wrong parameter being passed to Bowtie2:
-x /home/bostjan/struo2_databases/GTDB_release95/humann3/uniref90/all_genes_annot.rev
instead of
-x /home/bostjan/struo2_databases/GTDB_release95/humann3/uniref90/all_genes_annot

I tried to rename Bowtie2 index files to include additional .rev in their names, but then Humann3
keeps adding more and more fragments ".rev" to the name. Hence, the issue is that Humann3
wrongly parses the contents of the directory with Bowtie2 index.

Since this issue is not mentioned anywhere, I am curious whether somebody else also experienced it.

Thank you in advance and best regards,
Bostjan Murovec

Thanks for Bostjan for you interest in the Struo2 database!

I cannot reproduce the issue, but I did realized that there were some typos in the ReadMe.md file.

My run (you'd have to change the paths to reproduce):

NUC_DB=/ebio/abt3_projects/databases_no-backup/GTDB/release95/Struo2/humann3/uniref50/genome_reps_filt_annot.fna.gz
PROT_DB=/ebio/abt3_projects/databases_no-backup/GTDB/release95/Struo2/humann3/uniref50/protein_database/uniref50_201901.dmnd
NUC_DB_DIR=`dirname $NUC_DB`
PROT_DB_DIR=`dirname $PROT_DB`

humann3 --bypass-nucleotide-index \
  --search-mode uniref50   \
  --nucleotide-database $NUC_DB_DIR   \
  --protein-database $PROT_DB_DIR \
  --input-format fastq   \
  --output-basename humann3_output   \
  --input reads.fq   \
  --output humann3_output_dir   \
  --o-log humann3_output

Maybe the *.bt2l files were corrupted when you downloaded them (eg., an incomplete download)?
I will add md5 sums for all data files on the ftp server (I should have done that already).

Dear nick-youngblut,
thank you for your prompt response.
Indeed, checking of files would be prudent, although from the logs and stderr I conclude that Bowtie2 is instructed to access wrong file name. I speculate at this point that this is the Humann3's bug ???

I have also spotted and corrected the mentioned typos, so this is not the source of the trouble.

Anyway, thank you for the valuable information that the issue is not reproducible. I will do some further investigations on my system.

Best regards,
Bostjan Murovec

I am using a conda env with the following:

humann   3.0.0.alpha.3    py37h83b1523_0    biobakery
bowtie2                   2.4.2            py37h8270d21_1    bioconda

...so it could be an issue with the versions that you are using.

That is exactly my version too. The mystery persists.

Dear nick-youngblut,
would you be so kind to list the exact versions of all packages in your Humann3' Conda environment?
I still cannot resolve the issue.
Thank you in advance and best regards,
Bostjan Murovec

Here's my full humann3 conda env:

# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       1_gnu    conda-forge
bcbio-gff                 0.6.6              pyh864c0ab_1    bioconda
biom-format               2.1.10           py37ha21ca33_0    conda-forge
biopython                 1.78             py37h8f50634_1    conda-forge
blast                     2.10.1          pl526he19e7b1_3    bioconda
boost-cpp                 1.70.0               h7b93d67_3    conda-forge
bowtie2                   2.4.2            py37h8270d21_1    bioconda
brotlipy                  0.7.0           py37hb5d75c8_1001    conda-forge
bx-python                 0.8.9            py37h73d7ac5_2    bioconda
bzip2                     1.0.8                h7f98852_4    conda-forge
c-ares                    1.17.1               h36c2ea0_0    conda-forge
ca-certificates           2021.1.19            h06a4308_0
cached-property           1.5.2                      py_0
capnproto                 0.6.1                hfc679d8_1    conda-forge
certifi                   2020.12.5        py37h89c1867_1    conda-forge
cffi                      1.14.5           py37hc58025e_0    conda-forge
chardet                   4.0.0            py37h89c1867_1    conda-forge
click                     7.1.2              pyh9f0ad1d_0    conda-forge
cmseq                     1.0.2              pyh7b7c402_0    bioconda
cryptography              3.4.4            py37hf1a17b8_0    conda-forge
curl                      7.71.1               he644dc0_8    conda-forge
cycler                    0.10.0                     py_2    conda-forge
dendropy                  4.5.2              pyh3252c3a_0    bioconda
diamond                   2.0.7                h56fc30b_0    bioconda
entrez-direct             13.9            pl526h375a9b1_1    bioconda
expat                     2.2.10               h9c3ff4c_0    conda-forge
fasttree                  2.1.10               h516909a_4    bioconda
freetype                  2.10.4               h0708190_1    conda-forge
future                    0.18.2           py37h89c1867_3    conda-forge
glpk                      4.65              h9202a9a_1003    conda-forge
gmp                       6.2.1                h58526e2_0    conda-forge
gsl                       2.6                  he838d99_2    conda-forge
h5py                      3.1.0           nompi_py37h1e651dc_100    conda-forge
hdf5                      1.10.6          nompi_h6a2412b_1114    conda-forge
htslib                    1.11                 hd3b49d5_1    bioconda
humann                    3.0.0.alpha.3    py37h83b1523_0    biobakery
icu                       67.1                 he1b5a44_0    conda-forge
idna                      2.10               pyh9f0ad1d_0    conda-forge
iqtree                    2.0.3                h176a8bc_1    bioconda
jpeg                      9d                   h516909a_0    conda-forge
kiwisolver                1.3.1            py37h2527ec5_1    conda-forge
krb5                      1.17.2               h926e7f8_0    conda-forge
lcms2                     2.12                 hddcbb42_0    conda-forge
ld_impl_linux-64          2.35.1               hea4e1c9_2    conda-forge
libblas                   3.9.0                8_openblas    conda-forge
libcblas                  3.9.0                8_openblas    conda-forge
libcurl                   7.71.1               hcdd3856_8    conda-forge
libdeflate                1.6                  h516909a_0    conda-forge
libedit                   3.1.20191231         he28a2e2_2    conda-forge
libev                     4.33                 h516909a_1    conda-forge
libffi                    3.3                  h58526e2_2    conda-forge
libgcc-ng                 9.3.0               h2828fa1_18    conda-forge
libgfortran-ng            9.3.0               hff62375_18    conda-forge
libgfortran5              9.3.0               hff62375_18    conda-forge
libgomp                   9.3.0               h2828fa1_18    conda-forge
liblapack                 3.9.0                8_openblas    conda-forge
libnghttp2                1.43.0               h812cca2_0    conda-forge
libopenblas               0.3.12          pthreads_h4812303_1    conda-forge
libpng                    1.6.37               hed695b0_2    conda-forge
libssh2                   1.9.0                hab1572f_5    conda-forge
libstdcxx-ng              9.3.0               h6de172a_18    conda-forge
libtiff                   4.2.0                hdc55705_0    conda-forge
libwebp-base              1.2.0                h7f98852_0    conda-forge
lz4-c                     1.9.3                h9c3ff4c_0    conda-forge
lzo                       2.10              h516909a_1000    conda-forge
mafft                     7.475                h516909a_0    bioconda
mash                      2.2.2                ha61e061_2    bioconda
matplotlib-base           3.3.4            py37h0c9df89_0    conda-forge
metaphlan                 3.0.7              pyh7b7c402_0    bioconda
muscle                    3.8.1551             hc9558a2_5    bioconda
ncurses                   6.2                  h58526e2_4    conda-forge
numpy                     1.20.1           py37haa41c4c_0    conda-forge
olefile                   0.46               pyh9f0ad1d_1    conda-forge
openssl                   1.1.1j               h7f98852_0    conda-forge
pandas                    1.2.2            py37hdc94413_0    conda-forge
patsy                     0.5.1                      py_0    conda-forge
pcre                      8.44                 he1b5a44_0    conda-forge
perl                      5.26.2            h36c2ea0_1008    conda-forge
perl-app-cpanminus        1.7044                  pl526_1    bioconda
perl-archive-tar          2.32                    pl526_0    bioconda
perl-base                 2.23                    pl526_1    bioconda
perl-business-isbn        3.004                   pl526_0    bioconda
perl-business-isbn-data   20140910.003            pl526_0    bioconda
perl-carp                 1.38                    pl526_3    bioconda
perl-common-sense         3.74                    pl526_2    bioconda
perl-compress-raw-bzip2   2.087           pl526he1b5a44_0    bioconda
perl-compress-raw-zlib    2.087           pl526hc9558a2_0    bioconda
perl-constant             1.33                    pl526_1    bioconda
perl-data-dumper          2.173                   pl526_0    bioconda
perl-digest-hmac          1.03                    pl526_3    bioconda
perl-digest-md5           2.55                    pl526_0    bioconda
perl-encode               2.88                    pl526_1    bioconda
perl-encode-locale        1.05                    pl526_6    bioconda
perl-exporter             5.72                    pl526_1    bioconda
perl-exporter-tiny        1.002001                pl526_0    bioconda
perl-extutils-makemaker   7.36                    pl526_1    bioconda
perl-file-listing         6.04                    pl526_1    bioconda
perl-file-path            2.16                    pl526_0    bioconda
perl-file-temp            0.2304                  pl526_2    bioconda
perl-html-parser          3.72            pl526h6bb024c_5    bioconda
perl-html-tagset          3.20                    pl526_3    bioconda
perl-html-tree            5.07                    pl526_1    bioconda
perl-http-cookies         6.04                    pl526_0    bioconda
perl-http-daemon          6.01                    pl526_1    bioconda
perl-http-date            6.02                    pl526_3    bioconda
perl-http-message         6.18                    pl526_0    bioconda
perl-http-negotiate       6.01                    pl526_3    bioconda
perl-io-compress          2.087           pl526he1b5a44_0    bioconda
perl-io-html              1.001                   pl526_2    bioconda
perl-io-socket-ssl        2.066                   pl526_0    bioconda
perl-io-zlib              1.10                    pl526_2    bioconda
perl-json                 4.02                    pl526_0    bioconda
perl-json-xs              2.34            pl526h6bb024c_3    bioconda
perl-libwww-perl          6.39                    pl526_0    bioconda
perl-list-moreutils       0.428                   pl526_1    bioconda
perl-list-moreutils-xs    0.428                   pl526_0    bioconda
perl-lwp-mediatypes       6.04                    pl526_0    bioconda
perl-lwp-protocol-https   6.07                    pl526_4    bioconda
perl-mime-base64          3.15                    pl526_1    bioconda
perl-mozilla-ca           20180117                pl526_1    bioconda
perl-net-http             6.19                    pl526_0    bioconda
perl-net-ssleay           1.88            pl526h90d6eec_0    bioconda
perl-ntlm                 1.09                    pl526_4    bioconda
perl-parent               0.236                   pl526_1    bioconda
perl-pathtools            3.75            pl526h14c3975_1    bioconda
perl-scalar-list-utils    1.52            pl526h516909a_0    bioconda
perl-socket               2.027                   pl526_1    bioconda
perl-storable             3.15            pl526h14c3975_0    bioconda
perl-test-requiresinternet 0.05                    pl526_0    bioconda
perl-time-local           1.28                    pl526_1    bioconda
perl-try-tiny             0.30                    pl526_1    bioconda
perl-types-serialiser     1.0                     pl526_2    bioconda
perl-uri                  1.76                    pl526_0    bioconda
perl-www-robotrules       6.02                    pl526_3    bioconda
perl-xml-namespacesupport 1.12                    pl526_0    bioconda
perl-xml-parser           2.44_01         pl526ha1d75be_1002    conda-forge
perl-xml-sax              1.02                    pl526_0    bioconda
perl-xml-sax-base         1.09                    pl526_0    bioconda
perl-xml-sax-expat        0.51                    pl526_3    bioconda
perl-xml-simple           2.25                    pl526_1    bioconda
perl-xsloader             0.24                    pl526_0    bioconda
phylophlan                3.0.2                      py_0    bioconda
pigz                      2.5                  h27826a3_0    conda-forge
pillow                    8.1.0            py37h4600e1f_2    conda-forge
pip                       21.0.1             pyhd8ed1ab_0    conda-forge
pycparser                 2.20               pyh9f0ad1d_2    conda-forge
pyopenssl                 20.0.1             pyhd8ed1ab_0    conda-forge
pyparsing                 2.4.7              pyh9f0ad1d_0    conda-forge
pysam                     0.16.0.1         py37hc334e0b_1    bioconda
pysocks                   1.7.1            py37h89c1867_3    conda-forge
python                    3.7.9           hffdb5ce_100_cpython    conda-forge
python-dateutil           2.8.1                      py_0    conda-forge
python-lzo                1.12            py37he0a3664_1003    conda-forge
python_abi                3.7                     1_cp37m    conda-forge
pytz                      2021.1             pyhd8ed1ab_0    conda-forge
raxml                     8.2.12               h516909a_2    bioconda
readline                  8.1                  h27cfd23_0
requests                  2.25.1             pyhd3deb0d_0    conda-forge
samtools                  1.11                 h6270b1f_0    bioconda
scipy                     1.6.0            py37h14a347d_0    conda-forge
seaborn                   0.11.1               ha770c72_0    conda-forge
seaborn-base              0.11.1             pyhd8ed1ab_1    conda-forge
seqkit                    0.15.0                        0    bioconda
setuptools                52.0.0           py37h06a4308_0
six                       1.15.0             pyh9f0ad1d_0    conda-forge
sqlite                    3.34.0               h74cdb3f_0    conda-forge
statsmodels               0.12.2           py37h902c9e0_0    conda-forge
tbb                       2020.3               hfd86e86_0
tk                        8.6.10               hed695b0_1    conda-forge
tornado                   6.1              py37h5e8e339_1    conda-forge
trimal                    1.4.1                hc9558a2_4    bioconda
urllib3                   1.26.3             pyhd8ed1ab_0    conda-forge
wheel                     0.36.2             pyhd3deb0d_0    conda-forge
xz                        5.2.5                h516909a_1    conda-forge
zlib                      1.2.11            h516909a_1010    conda-forge
zstd                      1.4.8                ha95c52a_1    conda-forge

It was created with the yaml:

channels:
- conda-forge
- bioconda
- biobakery
dependencies:
- pigz
- bioconda::seqkit
- bioconda::diamond>=2.0.3
- bioconda::metaphlan>=3.0.1
- biobakery::humann

THANK YOU!!!
You are extremely prompt and helpful.
Best regards,
Bostjan Murovec

Hello, could you tell me if the error was solved? I am having the same issue, the "rev,rev" thing. However, I have got a slightly different error message:

Error message returned from bowtie2 : Could not open index file /lustre/marialaura/databases/humann/struo_gtdb/nucleotide/all_genes_annot.rev.rev.1.bt2l Could not open index file /lustre/marialaura/databases/humann/struo_gtdb/nucleotide/all_genes_annot.rev.rev.2.bt2l (ERR): bowtie2-align died with signal 11 (SEGV)

I have searched for the meaning of error "signal 11 SEGV". Here is what I found and tried, but I do not understand well, I am a beginner and still have lots to learn.

I found this on Bowtie2 site, here. "Signal 11 SEGV" seems to be related to the number of threads (one thread vs. multi-thread). This should be fixed in version 2.4, though a few people still have this error (including me, who is using 2.4.2). I tried running with one single thread, but it still failed with the same error.

If you have found a solution for the issue, please tell me. Thank you.

Does the all_genes_annot.rev.rev.1.bt2l indeed exist at /lustre/marialaura/databases/humann/struo_gtdb/nucleotide/?

If it does exist, what is the file size? It should be 12G

Does the all_genes_annot.rev.rev.1.bt2l indeed exist at /lustre/marialaura/databases/humann/struo_gtdb/nucleotide/?

If it does exist, what is the file size? It should be 12G

That is what I have:

[marialaura]$ tree -h
|-- [ 12K]  nucleotide
|   |-- [ 12G]  all_genes_annot.1.bt2l
|   |-- [ 13G]  all_genes_annot.2.bt2l
|   |-- [500M]  all_genes_annot.3.bt2l
|   |-- [6.6G]  all_genes_annot.4.bt2l
|   |-- [ 12G]  all_genes_annot.rev.1.bt2l
|   |-- [ 13G]  all_genes_annot.rev.2.bt2l
|   `-- [7.9G]  genome_reps_filt_annot.fna.gz
`-- [ 12K]  protein
`-- [ 11G]  uniref90_201901.dmnd 
2 directories, 8 files

So I do not have all_genes_annot.rev.rev.1.bt2l ("double rev"), I just have all_genes_annot.rev.1.bt2l ("single rev"). But as @BostjanMurovec already said, even if I tried to rename it to "double rev", it would return an error Could not open index file all_genes_annot.rev.rev.rev.1.bt2l ("triple rev").

If I delete the all_genes_annot.rev.1.bt2l, the program asks for all_genes_annot.rev.1.bt2l (single rev). But then it doesn't exist, of course.

In my last attempt, I used the command:

humann \
--bypass-nucleotide-index \
--search-mode uniref90 \
--remove-temp-output \
--nucleotide-database /marialaura/databases/humann/struo_gtdb/nucleotide/ \ 
--protein-database /marialaura/databases/humann/struo_gtdb/protein/ \ 
--taxonomic-profile /marialaura/kraken_results/mpa_syle/S57.mpa.txt \
--threads 1 \
--input /marialaura/samples_joined/S57_R1_R2.fastq.gz \
--output /marialaura/humann_results/humann_23-06_test 

I used Bowtie 2.4.2 and Humann v3.0.0.alpha.4.

@marialgk this is likely a humann3 bug. Can you try updating humann3 to the non-alpha version of humann3?

Closing due to lack of activity. Feel free to reopen, if needed.