Missing genome in final output
Closed this issue · 9 comments
Hi @davised,
I got (yet) another question.
After running the automlsa2 pipeline with all default parameters, I see that there is a list of genomes that were discarded. However, in this list, a very interesting genome for me is included (i.e. that genome is excluded from the analysis).
The presence_matrix.tsv shows that the gyrB gene is missing. My question is, what are my options to allow this genome to be included in my analysis? Should I lower the identity parameter? Alternatively, I could add myself the gyrB sequence to the gyrB multifasta file and use the --checkpoint parameter stating "prealign" as a command?
Cheers,
Pablo
Great question, you can increase the --allow-missing
option to a higher number and it will include more genomes in the analysis. If the genome is missing a single gene, then you can set --allow-missing 1
and the genome you are interested in should be included.
Also, there should be a line in the log that says something like:
WARNING Genome Lelliottia_amnigena_LMG_2784.fna is going to be removed due to missing queries. blast_functions.py:276
WARNING Increase --allow_missing to 1 from 0 to keep this genome. blast_functions.py:278
Hopefully that will be helpful in the future if you continue to run into this problem.
Hi @davised,
Indeed I saw that line. I followed your suggestions and everything was working well until the last IQTREE step (see the attached log file).
This is the command I ran:
automlsa2 --dir /Users/pablo/Downloads/Agro_tax_217 --query /Users/pablo/Downloads/automlsa2_markers/mlsa_markers_NCPPB2659.fasta --outgroup /Users/pablo/Downloads/Agro_tax_217/GCF_000697965.2_ASM69796v2_genomic.fasta -p blastn --allow_missing 1 -t 2 mlsa_218_test1
I got the following error:
ERROR: Alignment does not have specified outgroup taxon /Users/pablo/Downloads/Agro_tax_217/GCF_000697965.2_ASM69796v2_genomic.fasta [02/03/22 18:05:02] CRITICAL iqtree2 seems to have failed. phylogeny.py:86 CRITICAL Check the log files for error phylogeny.py:91 messages to see if they can be resolved. INFO Program exiting with code helper_functions.py:95 (73) indicating failure. INFO Check error messages to helper_functions.py:97 resolve the problem.
It seems that I need to indicate which assembly is my outgroup. I tried running the following command:
automlsa2 --dir /Users/pablo/Downloads/Agro_tax_217 --query /Users/pablo/Downloads/automlsa2_markers/mlsa_markers_NCPPB2659.fasta --outgroup /Users/pablo/Downloads/Agro_tax_217/GCF_000697965.2_ASM69796v2_genomic.fasta -p blastn --allow_missing 1 --iqtree -o /Users/pablo/Downloads/Agro_tax_217/GCF_000697965.2_ASM69796v2_genomic.fasta -t 2 mlsa_218_test1
But it showed an error.
automlsa2: error: argument --iqtree: expected one argument
Is my assumption about the error correct? And if so, do you know how should I input the command for IQTREE?
Thanks a lot for your help!
Cheers,
mlsa_218_test1.log
Pablo
IQ-Tree expects something like GCF_000697965.2_ASM69796v2_genomic as the outgroup (the basename of your fasta file input excluding the file suffix).
So don't give it the full path to the fasta file, but just the genome name. automlsa2 expects the filename to be the genome name and removes the file suffix for you.
In the future, I'll run a check for the outgroup option so that you won't be able to enter a path name. And eventually I will put in a check that the outgroup is one of the genome names that automlsa2 knows about from your input.
Also, as an aside, you can use my other tool https://github.com/davised/get_assemblies to help you download genome assemblies with the genome names in the output rather than the assembly accessions, if this is something you are interested in the future.
Hi @davised
Thank you for the clarification.
I tried running the following command using the genome name without the suffix for IQ-Tree
automlsa2 --dir /Users/pablo/Downloads/Agro_tax_217 --query /Users/pablo/Downloads/automlsa2_markers/mlsa_markers_NCPPB2659.fasta --outgroup /Users/pablo/Downloads/Agro_tax_217/GCF_000697965.2_ASM69796v2_genomic.fasta -p blastn --allow_missing 1 --iqtree "-o GCF_000697965.2_ASM69796v2_genomic" -t 2 mlsa_218_test2
However, I still get the same error.
ERROR: ERROR: *** IQ-TREE CRASHES WITH SIGNAL SEGMENTATION FAULT ERROR: *** For bug report please send to developers: ERROR: *** Log file: mlsa_218_test2.nex.log ERROR: *** Alignment files (if possible) [02/04/22 14:19:28] CRITICAL iqtree2 seems to have failed. phylogeny.py:86 CRITICAL Check the log files for error phylogeny.py:91 messages to see if they can be resolved. INFO Program exiting with code helper_functions.py:95 (73) indicating failure. INFO Check error messages to helper_functions.py:97 resolve the problem.
Could you give me an example of how my command should look like?
Cheers,
Pablo
Set this flag -> --outgroup GCF_000697965.2_ASM69796v2_genomic
Remove the --iqtree ...
portion of your current command.
Hi @davised
I tried your suggestion (with a new smaller dataset of only 10 strains now). But I still get the same error :/
automlsa2 --dir dummy_set --query automlsa2_markers/mlsa_markers_NCPPB2659.fasta --outgroup GCF_000697965.2_ASM69796v2_genomic --allow_missing 1 -p blastn -t 4 mlsa_10_test4
Then I get the same error:
ERROR: ERROR: *** IQ-TREE CRASHES WITH SIGNAL SEGMENTATION FAULT ERROR: *** For bug report please send to developers: ERROR: *** Log file: mlsa_10_test4.nex.log ERROR: *** Alignment files (if possible) CRITICAL iqtree2 seems to have failed. phylogeny.py:86 CRITICAL Check the log files for error phylogeny.py:91 messages to see if they can be resolved. INFO Program exiting with code helper_functions.py:95 (73) indicating failure. INFO Check error messages to helper_functions.py:97 resolve the problem.
I am attaching the log so you can check it.
mlsa_10_test4.log
Cheers,
Pablo