pierrebarbera/epa-ng

model info error

RachelDanie opened this issue · 2 comments

Hello, I am having trouble running epa-ng with the following error:
INFO Selected: Output dir: ./epa_tree/
INFO Selected: Query file: query.fasta
INFO Selected: Tree file: T3.raxml.bestTree
INFO Selected: Reference MSA: reference.fasta
INFO Selected: Automatic switching of use of per rate scalers
INFO Selected: Preserving the root of the input tree
INFO Selected: Specified model file: RAxML_info.info
what(): Model string in provided file seems wrong.
XXXX.sh: line 20: 26465 Aborted (core dumped) epa-ng --tree T3.raxml.bestTree --ref-msa reference.fasta --query query.fasta --outdir $OUT --model RAxML_info.info

I am attempting to align 806 amplicon sequences to 1121 nifH reference sequences. I started by running raxml-ng to build a reference tree on muscle-aligned ref seqs with the following command:
raxml-ng --msa T2.raxml.rba --model GTR+G --prefix T3 --threads 8 --seed 8273

I then used papara to align query seqs, and the raxml-ng --split to seperate aligned seqs

In my first go running epa-ng, I provided the example model parameters suggested in the full stack tutorial to define the model:
GTR{0.7/1.8/1.2/0.6/3.0/1.0}+FU{0.25/0.23/0.30/0.22}+G4{0.47}

But I got the following error:
ERR When using epa-ng like this, a model has to be explicitly specified!
You may specify it generically (GTR+G), however parameters will not be optimized.
Instead we reccommend to use RAxML to re-evaluate the parameters and then pass the resulting
RAxML_info file to the epa-ng --model argument. epa-ng will then auto-parse the parameters.
( raxmlHPC -f e -s -t -n info -m GTRGAMMAX )

So I ran the example command above (but I did get an error leading me to change the -m option to GTRGAMMA [the only other possible input it GTRGAMMI), and that executed fine.
But using the RAxML_info file produced as input for epa-ng above threw the above error.

Is there some other way to get around this? If it helps below in the contents of the RAxML_info file:
_This is RAxML version 7.3.0 released by Alexandros Stamatakis in June 2011.

With greatly appreciated code contributions by:
Andre Aberer (HITS)
Simon Berger (HITS)
Nick Pattengale (Sandia)
Wayne Pfeiffer (SDSC)
Akifumi S. Tanabe (Univ. Tsukuba)

Alignment has 4167 distinct alignment patterns

Proportion of gaps and completely undetermined characters in this alignment: 93.41%

RAxML Model Optimization up to an accuracy of 0.100000 log likelihood units

Using 1 distinct models/data partitions with joint branch length optimization

All free model parameters will be estimated by RAxML
GAMMA model of rate heteorgeneity, ML estimate of alpha-parameter

GAMMA Model parameters will be estimated up to an accuracy of 0.1000000000 Log Likelihood units

Partition: 0
Alignment Patterns: 4167
Name: No Name Provided
DataType: DNA
Substitution Matrix: GTR
RAxML was called as follows:

raxmlHPC -f e -s ref.clus.phyi -t T3.raxml.bestTree -n info -m GTRGAMMA

Testing which likelihood implementation to use
Standard Implementation full tree traversal time: 2.301094
Subtree Equality Vectors for gap columns full tree traversal time: 0.809563
... using SEV-based implementation

Model parameters (binary file format) written to: /home/rodrigues-lab/msa_red/epa_ng/RAxML_binaryModelParameters.info

Overall Time for Tree Evaluation 419.071737
Final GAMMA likelihood: -186925.416854
Number of free parameters for AIC-TEST(BR-LEN): 2248
Number of free parameters for AIC-TEST(NO-BR-LEN): 9

Model Parameters of Partition 0, Name: No Name Provided, Type of Data: DNA
alpha: 1.029898
Tree-Length: 201.377284
rate A <-> C: 1.154527
rate A <-> G: 2.645042
rate A <-> T: 1.360458
rate C <-> G: 1.626075
rate C <-> T: 3.503977
rate G <-> T: 1.000000

freq pi(A): 0.240682
freq pi(C): 0.260669
freq pi(G): 0.267798
freq pi(T): 0.230851_

I should also add that simply running
epa-ng --tree T3.raxml.bestTree --ref-msa reference.fasta --query query.fasta --outdir $OUT --model GTR+G

did not throw errors, but did not place query sequences in the tree (the resulting .jplace file was only reference sequences)

Hi @RachelDanie !

It's pretty surprising that the first way you tried it (with raxml-ng) didn't work... did you supply that model string on the command line?

Can you try to instead do --model <raxml-ng best.model file>? In principle thats just a file with that string, followed by a partition name and range. It should be one of the outputs of raxml-ng. If that doesn't work, then the issue is probably a bigger one...

Let me know how it works
Pierre