computations/root_digger

Error Could not parse msa file

Closed this issue · 5 comments

Hello,

I have installed root-digger v1.7.0-21-gdb04b97

And I performed the following attempt to root an unrooted tree:

Input:

rd --msa ar122_alignment.faa --tree ar122_iqtree.nwk --exhaustive

Output:

[0.00] [Warning] Loading options from the checkpoint file
[0.01] Running Root Digger
[0.01] Version: v1.7.0-21-gdb04b97
[0.01] Build Commit: db04b97b863868bd98b7416e732ddceb715ee682
[0.01] Build Date: 2022-04-29 08:45:10
[0.01] Started: 2022-04-29 12:22:16
[0.01] Seed: 3130080073
[0.01] Number of threads per proc: 32
[0.01] Command: /usr/local/bin/rd --msa ar122_alignment.faa --tree ar122_iqtree.nwk --exhaustive
[0.01] Please report any bugs to https://groups.google.com/forum/#!forum/raxml
There was an error during processing:
Could not parse msa file

ar122_alignment is the protein alignment downloaded from http://ftp.ebi.ac.uk/pub/databases/metagenomics/mgnify_genomes/human-gut/v1.0/phylogenies/. Also the Newick tree format is downloaded from the same ftp webpage.

Root-Digger required a tree in newick format, and a MSA in either PHYLIP or FASTA format. I changed the extension of .faa to .fasta but the same error persists. Also I performed a conversion of faa file to Phylip format obtaining the same error.

That protein alignement .faa file is produced by GT-DB-Tk software.

If you could give me some hints about how can I adapt my MSA file (ar122_alignment.faa) to be accepted by Root-digger will be very helpful.

Thanks on advance,

Magí.

Hey, thanks for the issue.

It seems to me that the fasta you linked is protein data, which RootDigger doesn't support, due to the large number of parameters. You would need to find NT data for the same protein in order to run a RootDigger analysis.

Many thanks @computations for your rapid reply.

Ohhh, I found your software very interesting. It would be grateful if in the some release in the future the use of protein alignments could be also used as input.

If I had had informatics background I would ask you to collaborate but I'm biologist that likes coding but I don't have more level.

Nevertheless, thanks for your software release!!!

Magí.

No problem, and thanks for your interest! I don't think there is anything technical stopping RootDigger from working with protein data, but I'm not sure the results would be meaningful. The number of parameters inferred goes from 12 to ~400, and the numerical routines used internally aren't really that good to begin with. So, what I think would end up happening is the program would overfit the noise and numerical errors. However, I never fully investigated this, so I will register your interest. I think that a full investigation of this topic might be warranted, based on your interest.

However, to help you specifically I think IQ-TREE does support full protein unrest, and I think they have even added some support for a process like RootDigger's exhaustive mode. See their paper here:

https://www.biorxiv.org/content/10.1101/2020.07.31.230144v2.abstract

I don't know what you plan to do, but maybe this helps!

Thanks another time @computations to share with me this preprint. I have installed a newer version of IQTREE that implements the rootstrap tool, which is the thing that I need. However it is a version not stable and also the tutorial example does not work. I have written IQTREE developers to try to solve it.

Thank you for the recommendation and I will check in the future for new releases for Root-Digger!!!

Kind regards,

Magí.

If there is nothing else then, I am going to close this ticket. Thanks for your feedback!