`raxml-ng --parse` does not produce `.rba` file for a `.catg` file as `--msa` input in version 1.1.0
Closed this issue · 4 comments
According to to the "Preparing the alignment" section of the tutorial:
In addition to MSA sanity check, this command will perform two useful operations:
- Compress alignment patterns and store MSA in the binary format (RAxML Binary Alignment, RBA):
NOTE: Binary MSA file created: T2.raxml.rba
Since pattern compression could take quite some time for large MSAs, loading RBA file is (much) faster compared to FASTA or PHYLIP.
Thus, when running the following command, I was expecting an results/raxml_ng_parse/control.raxml.rba
file to be created:
raxml-ng --parse --msa results/raxml_ng_input/control.ml_gt_and_likelihoods.catg --model GTGTR+FO --prefix results/raxml_ng_parse/control --log DEBUG
However, this file is not created and no NOTE: Binary MSA file created: ...
appears in the logs, even with --log DEBUG
. What am I doing wrong? Or is the .rba
creation simply not supported for CATG
files? Or does it not make sense? If so, I'd suggest adding this information to the Wiki, both in the above location and in the section on output files:
https://github.com/amkozlov/raxml-ng/wiki/Output:-files-and-settings#output-files
For more details, here's the --log DEBUG
output of the above command:
RAxML-NG v. 1.1 released on 29.11.2021 by The Exelixis Lab.
Developed by: Alexey M. Kozlov and Alexandros Stamatakis.
Contributors: Diego Darriba, Tomas Flouri, Benoit Morel, Sarah Lutteropp, Ben Bettisworth.
Latest version: https://github.com/amkozlov/raxml-ng
Questions/problems/suggestions? Please visit: https://groups.google.com/forum/#!forum/raxml
System: AMD EPYC 7443P 24-Core Processor, 24 cores, 995 GB RAM
RAxML-NG was called at 31-Mar-2023 14:42:18 as follows:
raxml-ng --parse --msa results/raxml_ng_input/control.ml_gt_and_likelihoods.catg --model GTGTR+FO --prefix results/raxml_ng_parse/control --log DEBUG
Analysis options:
run mode: Alignment parsing and compression
start tree(s):
random seed: 1680266538
tip-inner: OFF
pattern compression: ON
per-rate scalers: OFF
site repeats: ON
branch lengths: proportional (ML estimate, algorithm: NR-FAST)
SIMD kernels: AVX2
parallelization: coarse-grained (auto), PTHREADS (auto)
RBA partial loading: OFF
|noname| |GTGTR+FO| ||
[00:00:00] Reading alignment from file: results/raxml_ng_input/control.ml_gt_and_likelihoods.catg
Failed to load as IPHYLIP: Unable to parse PHYLIP file: results/raxml_ng_input/control.ml_gt_and_likelihoods.catg
(LIBPLL-233): Sequence 2 (AAAAAMNAAAAAAAMNAMMNMANMNMAANANA) data out of alignment
Failed to load as PHYLIP: Unable to parse PHYLIP file: results/raxml_ng_input/control.ml_gt_and_likelihoods.catg
(LIBPLL-232): Sequence 1 (sample_x) longer than expected
Failed to load as FASTA: Error parsing FASTA file: results/raxml_ng_input/control.ml_gt_and_likelihoods.catg
(LIBPLL-203): Illegal header line in query fasta file
CATG: taxa: 32, sites: 1335471
CATG: taxon 0: sample_x
... [all the other taxons]
CATG: site 0 consesus seq: AAAAAMNAAAAAAAMNANNNMANMNMAANANA
CATG: number of states: 01-Jan-1970 01:00:10
CATG: site 1 consesus seq: AAAAAANMAAAMAMMNANNNAANMNAAANANA
... [all the other sites]
CATG: site 1335470 consesus seq: NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNKG
[00:00:59] Loaded alignment with 32 taxa and 1335471 sites
[00:00:59] Extracting partitions...
[00:00:59] Checking the alignment...
Alignment comprises 1 partitions and 1335471 sites
Partition 0: noname
Model: GTGTR+FO
Alignment sites: 1335471
Gaps: 61.92 %
Invariant sites: 0.00 %
* Per-taxon CLV size (elements) : 13354710
* Estimated memory requirements : 6318 MB
* Recommended number of threads / MPI processes: 108
* Maximum number of threads / MPI processes: 344
* Minimum number of threads / MPI processes: 31
Please note that numbers given above are rough estimates only.
Actual memory consumption and parallel performance on your system may differ!
Alignment can be successfully read by RAxML-NG.
Execution log saved to: /absolut/path/to/results/raxml_ng_parse/control.raxml.log
Analysis started: 31-Mar-2023 14:42:18 / finished: 31-Mar-2023 14:43:18
Elapsed time: 60.382 seconds
Consumed energy: 1.169 Wh
Hi David,
you're right: RBA file will not be created for probabilistic alignments (e.g. CATG), mainly because (discrete) pattern compression does not work in this case.
I added a corresponding note to the tutorial.
Thanks for the quick response!
Quick follow-up question from the note you added: Does VCF input work for regular raxml-ng
version 1.1? I had seen it in the CellPhy project, but from looking at the main repo code here in raxml-ng
, I though VCF support was so far only implemented on the respective CellPhy branch (which is not yet merged, so not part of the 1.1
release).
That's correct. As of now, VCF support is only available in the cellphy branch.
Thanks for the clarification!