arpcard/rgi

[BUG] hit reported at 100% but is completely different

Opened this issue · 0 comments

Describe the bug
Hi
I run RGI locally v5.2.1 (Ubuntu 22.04)
Database Version CARD:
card_canonical: 3.2.5

Input
I ran standard setting (rgi main -i file -o ./rgi_out/file --local;

Input
assembled genome

I got a hit with 100% identity:
Best_Hit_ARO Best_Identities ARO Model_type SNPs_in_Best_Hit_ARO Other_SNPs
AMR Gene GRD33-1 100 3006926 protein homolog model n/a n/a carbapenem antibiotic inactivation

These are the proteins:

Predicted_Protein
MSEAKNSWVTASDVARLAGVSRSAVSRTFTPGASVSEKTRQRVQAAATELGYQVNIIARSMITGSSNFIGLVTAGFDNPFRSKLLAPLAHNLAIQGFMPLLMNADDPKQLEPQLRELLSYHVAGVILTSGAPPLSLAEEYLARKIPVTLINRQTELDGADQVCSDNAQGATLAAHHLLAQGVTVAGFIGENAHNFSTRQRHQGFEQALTDHGQPLASIFCERGGYEAGWDAAAALVAQCPDLDGLFCATDMLAMGAMDYLHRHQPQQPVRIIGFDDIPQATYAAYQLTTIRQDTDCLAQTAVNLLVNRIRRFEQPSVQKTIPVELVVRQSA
CARD_Protein_Sequence
MAAMAAVAAVLLGVFAFAHAQDQPALWTQPQQPVRIIGNAWYVGTRGLSAILITSPTGAVLIDGAMRESADDIAKNITSLGVRLEDVKLIVNSHAHNDHAGGIAELQRRTGATVAALPWSAEALRSGRKHQGDPQFDTQTPPPDRVPKVKTIRDGEALHAGGVTITAHKTGGHTPGSTSWTWRSCEENRCVDIVYADSITAVSADGFRFTDNKTYPQAIDDFNKGYAFLRSASCDILVTPHPEASDFWGRIAKRDAGERDALIDRSQCARYADRADAQLQKRLATERAK

Completely different!

Aligned it looks like this:

Predicted_Protein          MSEAKNSWVTASDVARLAGVSRSAVSRTFTPGASVSEKTRQRVQAAATELGYQVNIIARS
CARD_Protein_Sequence      ------MAAMAAVAAVLLGVFAFAHAQD---QPALWTQPQQPVR----------------
                                   . *: .* * **   * :.     .::  :..* *.                

Predicted_Protein          MITGSSNFIGLVTAGFDNPFRSKLLAPLAHNLAIQGFMPLLMNADDPKQLEPQLRELLSY
CARD_Protein_Sequence      -IIGNAWYVG--TRGLS----AILITSPTGAVLIDGAMR--ESADD--------------
                            * *.:.::*  * *:.    : *::. :  : *:* *    .***              

Predicted_Protein          HVAGVILTSGAPPLSLAEEYLARKIPVTLINRQTELDGADQVCS-DNAQGATLAAHHLLA
CARD_Protein_Sequence      -IAKNITSLG---VRLEDVKL-------IVNSHAHNDHAGGIAELQRRTGATVAALPWSA
                            :*  * : *   : * :  *       ::* ::  * *. :.. :.  ***:**    *

Predicted_Protein          QGVTVAGFIGENAHNFSTRQ-------RHQGFEQALTDHGQPLASIFCERGGYEAG---W
CARD_Protein_Sequence      EALR-SGRKHQGDPQFDTQTPPPDRVPKVKTIRDGEALHAGGVTITAHKTGGHTPGSTSW
                           :.:  :*   :.  :*.*.        . : : :. : *.  ::    : **: .*   *

Predicted_Protein          DAAAALVAQCPDLDGLFCATDMLAMGAMDYLHRHQPQQPVRIIGFDD---IPQATYAAYQ
CARD_Protein_Sequence      TWRSCEENRCVD---IVYADSITAVSADGFRFTDNKTYPQAIDDFNKGYAFLRSASCDIL
                              :.   .* *   :. * .: *:.* .: .  :   *  * .*:.   : .:: .   

Predicted_Protein          LTTIRQDTDCLAQTAV------NLLVNRIR--RFEQPSVQKTIPVELVVRQSA
CARD_Protein_Sequence      VTPHPEASDFWGRIAKRDAGERDALIDRSQCARYADRA-DAQLQKRLATERAK
                           :*.  : :*  .. *       : *::* .  *: : : :  :   *.. .: 

There is not a 100 identity, not even over > 3 AA.
Why is this a hit at 100%?

original output
SK65-2.txt