Error: the symbol '*' is incorrect
Wuhl00 opened this issue · 4 comments
My fasta file ended up with ‘*’, so when I put my aligned file to trim I got this report. does it matter?
I am sorry @Wuhl00, the report you got is the name of the issue (Error: the symbol '*' is incorrect)? And what do you mean exactly with "My fasta file ended up with '*' "? Do you mean the content of the file? Its name?
when I use options -strictplus
or -strict
, I also get thousands of "Error: the symbol '?' is incorrect".
trimal -in my_msa.phy -out my_msa_out.phy -strictplus
My MSA sequence is like this
$ head -n 3 my_msa.phy
230 133652
M6-4 ?????????? ?????????? ?????????? ?????????? ??????????
M6-3 ?????????? ?????????? ?????????? ?????????? ??????????
M5-1 ?????????? ?????????? ?????????? ?????????? ??????????
$ sed -n 200,203p my_msa.phy
C4-4 MDETTIDSIF AGSLK-SLPA VSSKIVRIFT SSTFTGTXXX XXXXXXXXXX
A3-3 MDETTIDSIF AGSLK-SLPA VSSKIVRIFT SSTFTGTXXX XXXXXXXXXX
C8-2 MDETTIDSIF AGSLK-SLPA VSSKIVRIFT XXXXXDTTME RNTLMAQCYP
But there is no error when I use gappyout
or automated1
and the program finish successfully in a few seconds, generating a trimmed MSA.
$ trimal -in my_msa.phy -out my_msa_out2.phy -automated1
$ head -n 3 my_msa_out2.phy
230 103599
M6-4 ????????????????????????????????????????????????????????????
M6-3 ????????????????????????????????????????????????????????????
and this is my versions:
$ trimal --version
trimAl v1.4.rev15 build[2013-12-17]
$ lsb_release -a
LSB Version: :core-4.1-amd64:core-4.1-noarch
Description: Red Hat Enterprise Linux release 8.4 (Ootpa)
Hi @sinamajidian,
Currently there is no wildcard for unknown amino acids, so the programme will fail in some cases if a sequence has a symbol that is not an amino acid. As you experienced, gappyout will work ok because it does not use any similarity matrix as strict and strictplus do. For automated1 it will depend on the characteristics of the input MSA, since it selects between gappyout and strict at the end.
Thank you for your feedback!