marbl/parsnp

Failing at raxml step

Opened this issue · 7 comments

Hi, I'm currently using parsnp v1.5.3 on a slurm cluster (with raxml v8.2.12) and I keep getting an error when trying to run parsnp on ~13,000 bacterial genomes. Any idea on how to fix this?

CRITICAL - The following command failed:
>>$ raxmlHPC-PTHREADS -m GTRCAT -p 12345 -T 24 -s /data/projects/ABCDE/parsnp/parsnp.snps.mblocks -w /tmp/tmpwz1c2hjn -n OUTPUT
Please verify input data and restart Parsnp.
If the problem persists please contact the Parsnp development team.

  STDOUT:
  Warning, you specified a working directory via "-w"

Keep in mind that RAxML only accepts absolute path names, not relative ones!

RAxML can't, parse the alignment file as phylip file
it will now try to parse it as FASTA file

TOO FEW SPECIES

Hi @BioMinnie! Thanks for using Parsnp and opening an issue. Could you provide the command used to run Parsnp?

Hi, this is being run on a slurm queuing system, so the command is:

#SBATCH --ntasks=1
#SBATCH --cpus-per-task=24
#SBATCH --mem-per-cpu=24G

module load parsnp/1.5.3
module load numpy/1.17.3-python-3.7.4

parsnp -g reference.gbk -d QCfilt/ -p 24 -c -o /data/projects/ABCDE/parsnp

@BioMinnie sorry for the delayed response. The command looks fine, so unfortunately no easy fix there. I'll look into this error by running parsnp on a comparably large dataset, but in the meantime I have some questions that might help us towards a solution:

  • Can you inspect the /data/projects/ABCDE/parsnp/parsnp.snps.mblocks file? It seems the RAxML doesn't see this as a valid input file ("too few species"), so inspecting it may be helpful.
  • If the .mblocks file looks fine, then RAxML may be our issue. If the .mblocks file looks reasonable, you may want to try running parsnp by using FastTree instead.
  • 24G may not be enough memory. If you'd like, you can try running parsnp on a subset of your input data and/or use the -P flag to limit the partition size.

Hope this helps and happy holidays!

-Bryce

Hello, @BioMinnie , @bkille , did you ever find a solution to this problem? I am facing the same issue running the same command on Ubuntu 16.04 LTS remote Linux server. I have far fewer genomes (1 reference and 2 derived) though. I cleared my tmp folder and this did not solve the problem. I checked the parsnp.snps/mblocks file and it is empty. I am recieving the same error at the RAxML step.

My command is: ./parsnp -g <ref/genome.gbk> -d <derived/genomes/folder/*.fasta> -c

I am struggling to find a solution to the problem and I am unsure how to proceed/troubleshoot further.

Thanks for any help!
-Wolfgang

@WolfgangCZ

Just to clarify, does the command you're running include the angled brackets (< and >)? If so, you should remove them. In command line help messages, angled brackets are only there to indicate required arguments for a parameter.

./parsnp -g ref/genome.gbk -d derived/genomes/folder/*.fasta -c

If the issue persists, I'd be happy to try testing your input files to see if I could replicate the issue. Also, I would recommend using the conda version of parsnp if the error persists as that seems to have less room for installation issues.

@bkille

Thanks so much for getting back to me. Sorry-- to be clear there are not any angled brackets. That was just to indicate the arguments entered. The command you produced is what I ran.

Unfortunately, I am assuming the issue lies in the specific yet generally ignored formatting of gbk files. I tried using a fasta file for the same genomes and everything seemed to work fine. Additionally, it received a warning that the genomes were almost twice as large as the reference, but I don't believe this is true, so this leads me to believe it is my gbk file.

Thank you for the recommendation! I tried using conda but was having issues, so I opted to install utunbu desktop (I was facing a problem connecting to my display and visualizing with gingr). It would be nice to not loose my annotations, so after trying to reproduce the results on this new system, if it fails and you would be willing to check out the files that would be amazing. Again I really appreciate this.

-w

Thank you for the recommendation!

Yea the .gbk parsing has caused issues in the past. A future goal of mine is to offload the responsibility of manual file parsing to Biopython. I wouldn't be surprised if that was the issue.

What operating system are you using?

Also, I'm more than happy to help, thanks for working through this with me! If you want to send me the files, I'm at brycekille_at_gmail.com

-Bryce