GaetanBenoitDev/metaMDBG

metaMDBG asm fails at bgzip step

Closed this issue · 17 comments

I am attempting to run the following command using PacBio data: metaMDBG asm assembly mysample.fastq -t 4

I created the condor env according to the yml file. I have samtools Version: 1.6 (using htslib 1.6) in my conda env.

logs.txt shows the following:

Checking dependencies: bgzip (wfmash package)

cat assembly/tmp//testDependencies.fasta | bgzip --threads 4 -c > assembly/tmp//testDependencies_bgzip.fasta.gz
Checking dependencies: samtools

samtools faidx assembly/tmp//testDependencies_bgzip.fasta.gz


It appears that the samtools faidx command fails because the bgzip step is creating an empty file. But when I run the exact command on the command line it works just fine?

I am running CentOS 7, my conda env yml is attached.
Any suggestions would be appreciated.
Thank you

local_conda.txt

Can you try installing metamdbg using the conda package (instead of the yml file) in a new environment:
conda install -c conda-forge -c bioconda metamdbg

I hope it solve the problem.

I have the same problem while installing metamdbg using conda package. I think the root of the problem is that the temporary files created by metamdbg assemblydir/testDependencies.fasta has wrong format. It has an extra blank line after the reads sequence, then the samtools fail to process the fasta.gz file. I try to remove the extra blank line, zip the fasta file and then run samtools faidx, it can work. However when the pipeline of metamdbg starts, the wrong file will be created again, the whole pipeline cannot finish.

I have fixed the format issue, you should be able to recompile using the commands in section "Building from source (using conda)". Thanks a lot for finding the issue.

Great, I will give it a try. Thank you.

I am still having the same issue.
I started by deleting the old repo and downloaded it again.

Steps I used:

  1. created a new conda env based on the conda_env.yml : conda env create -f conda_env.yml

2)cd metaMDBG -> mkdir build -> cmake .. -> make -j 3

This gave a zlib.h error, I fixed by using : conda env config vars set CPATH=${CONDA_PREFIX}/include:${CPATH}

I then deactivated the env and reactivated it.

  1. ran make -j3 , this completes

  2. then I run home/me/bin/metaMDBG/build/bin/metaMDBG asm assembly my.fastq

I get the same error in the log file:
Checking dependencies: bgzip (wfmash package)

cat assembly/tmp//testDependencies.fasta | bgzip --threads 1 -c > assembly/tmp//testDependencies_bgzip.fasta.gz
Checking dependencies: samtools

samtools faidx assembly/tmp//testDependencies_bgzip.fasta.gz

I appreciate the help. Did I miss a step? Thank you

Can you activate the metamdbg conda env, then show me the samtools version with "samtools --version".
Thank you

Oups I've seen in your first post you have the same samtools version as me.
I have no idea where the issue come from, maybe this samtools version is buggy on some system. You could try to remove the samtools version in the conda_env.yml : - samtools=1.6 -> - samtools, then reinstall again. This will use the latest samtools version. If it works I will update the packages.

I implemented your suggestion (thank you) but I still get the same error. I have attached the cmake log, my current conda env yml and the make log files.

cmake-metaMDBG.log
current_env-YML.txt
make.log

Can you try to enter the samtools command yourself: samtools faidx assembly/tmp//testDependencies_bgzip.fasta.gz, and tell me if you have any error messages. If it works samtools should produce a (.fai) file: assembly/tmp//testDependencies_bgzip.fasta.gz.fai. Thank you

Also do you execute metaMDBG yourself, or do you submit the command as a job on a cluster?

samtools faidx assembly/tmp/testDependencies_bgzip.fasta.gz
Could not build fai index assembly/tmp/testDependencies_bgzip.fasta.gz.fai

I am just running this locally, I am not using this on a cluster.

I tried to compile locally again using the steps you provide. I still get the same error. I am going to pass this onto my local sys admin to see if they can get it to work.

git clone https://github.com/GaetanBenoitDev/metaMDBG.git
cd metaMDBG
mkdir build
cd build
cmake ..
make -j 3

Nothing wrong on your end, I have it working now. Thank you for your help, I am looking forward to the results.

Just so you know what we found:

The issue we ran up against is that the code calls :
string command2 = "{ /usr/bin/time -v " + command + "; } 2> " + outputDir + "/time.txt";
on line 876 of commons.hpp

Not all linux systems have /usr/bin/time installed.

type -a time
time is a shell keyword
time is /usr/bin/time

so some systems have the shell keyword time but not /usr/bin/time.
Thanks for your help.

Thanks a lot for finding the issue!
I will check if /usr/bin/time can be installed with conda. Otherwise I'll disable performance check.
I'll fix that soon.

I think that I am having a similar issue.
When I try to submit the command as a job on a cluster, I get the following error:

Checking dependencies: bgzip (wfmash package) ok
Checking dependencies: samtools
ERROR (logs: ~workspace//tmp//logs.txt)

At first, I thought it was missing the samtools, but it turns out that I am missing /usr/bin/time.

And unfortunately, I don't have the access to install 'time' with conda on the cluster.

Would there be a way to disable the performance check?
Thanks for your help.

Yes sorry for that, it's going to be fixed in the next version, I will be able to update the software in a few weeks.

Fixed in v1.0