[BUG] hit reported at 100% but is completely different
Opened this issue · 0 comments
Describe the bug
Hi
I run RGI locally v5.2.1 (Ubuntu 22.04)
Database Version CARD:
card_canonical: 3.2.5
Input
I ran standard setting (rgi main -i file -o ./rgi_out/file --local;
Input
assembled genome
I got a hit with 100% identity:
Best_Hit_ARO Best_Identities ARO Model_type SNPs_in_Best_Hit_ARO Other_SNPs
AMR Gene GRD33-1 100 3006926 protein homolog model n/a n/a carbapenem antibiotic inactivation
These are the proteins:
Predicted_Protein
MSEAKNSWVTASDVARLAGVSRSAVSRTFTPGASVSEKTRQRVQAAATELGYQVNIIARSMITGSSNFIGLVTAGFDNPFRSKLLAPLAHNLAIQGFMPLLMNADDPKQLEPQLRELLSYHVAGVILTSGAPPLSLAEEYLARKIPVTLINRQTELDGADQVCSDNAQGATLAAHHLLAQGVTVAGFIGENAHNFSTRQRHQGFEQALTDHGQPLASIFCERGGYEAGWDAAAALVAQCPDLDGLFCATDMLAMGAMDYLHRHQPQQPVRIIGFDDIPQATYAAYQLTTIRQDTDCLAQTAVNLLVNRIRRFEQPSVQKTIPVELVVRQSA
CARD_Protein_Sequence
MAAMAAVAAVLLGVFAFAHAQDQPALWTQPQQPVRIIGNAWYVGTRGLSAILITSPTGAVLIDGAMRESADDIAKNITSLGVRLEDVKLIVNSHAHNDHAGGIAELQRRTGATVAALPWSAEALRSGRKHQGDPQFDTQTPPPDRVPKVKTIRDGEALHAGGVTITAHKTGGHTPGSTSWTWRSCEENRCVDIVYADSITAVSADGFRFTDNKTYPQAIDDFNKGYAFLRSASCDILVTPHPEASDFWGRIAKRDAGERDALIDRSQCARYADRADAQLQKRLATERAK
Completely different!
Aligned it looks like this:
Predicted_Protein MSEAKNSWVTASDVARLAGVSRSAVSRTFTPGASVSEKTRQRVQAAATELGYQVNIIARS
CARD_Protein_Sequence ------MAAMAAVAAVLLGVFAFAHAQD---QPALWTQPQQPVR----------------
. *: .* * ** * :. .:: :..* *.
Predicted_Protein MITGSSNFIGLVTAGFDNPFRSKLLAPLAHNLAIQGFMPLLMNADDPKQLEPQLRELLSY
CARD_Protein_Sequence -IIGNAWYVG--TRGLS----AILITSPTGAVLIDGAMR--ESADD--------------
* *.:.::* * *:. : *::. : : *:* * .***
Predicted_Protein HVAGVILTSGAPPLSLAEEYLARKIPVTLINRQTELDGADQVCS-DNAQGATLAAHHLLA
CARD_Protein_Sequence -IAKNITSLG---VRLEDVKL-------IVNSHAHNDHAGGIAELQRRTGATVAALPWSA
:* * : * : * : * ::* :: * *. :.. :. ***:** *
Predicted_Protein QGVTVAGFIGENAHNFSTRQ-------RHQGFEQALTDHGQPLASIFCERGGYEAG---W
CARD_Protein_Sequence EALR-SGRKHQGDPQFDTQTPPPDRVPKVKTIRDGEALHAGGVTITAHKTGGHTPGSTSW
:.: :* :. :*.*. . : : :. : *. :: : **: .* *
Predicted_Protein DAAAALVAQCPDLDGLFCATDMLAMGAMDYLHRHQPQQPVRIIGFDD---IPQATYAAYQ
CARD_Protein_Sequence TWRSCEENRCVD---IVYADSITAVSADGFRFTDNKTYPQAIDDFNKGYAFLRSASCDIL
:. .* * :. * .: *:.* .: . : * * .*:. : .:: .
Predicted_Protein LTTIRQDTDCLAQTAV------NLLVNRIR--RFEQPSVQKTIPVELVVRQSA
CARD_Protein_Sequence VTPHPEASDFWGRIAKRDAGERDALIDRSQCARYADRA-DAQLQKRLATERAK
:*. : :* .. * : *::* . *: : : : : *.. .:
There is not a 100 identity, not even over > 3 AA.
Why is this a hit at 100%?
original output
SK65-2.txt