A rare case for pdb_delinsertion
Opened this issue · 2 comments
Today I found another curious case for pdb_delinsertion
. I don't call it a bug but, instead, a case not considered yet.
PDB ID: 1MH1
In this pdb
there are two residues at the beginning that belong to the purification tag and are assigned are insertions. The chain is continuous, there are no backbone breaks.
ATOM 1 N GLY A 1A 25.689 14.213 13.354 1.00 30.45 N··
ATOM 2 CA GLY A 1A 25.987 15.432 12.550 1.00 26.96 C··
ATOM 3 C GLY A 1A 25.765 16.621 13.483 1.00 25.19 C··
ATOM 4 O GLY A 1A 25.854 16.429 14.700 1.00 26.24 O··
ATOM 5 N SER A 2A 25.502 17.796 12.939 1.00 23.20 N··
ATOM 6 CA SER A 2A 25.239 18.997 13.759 1.00 18.18 C··
ATOM 7 C SER A 2A 23.808 19.423 13.540 1.00 16.47 C··
ATOM 8 O SER A 2A 23.127 19.047 12.551 1.00 16.08 O··
ATOM 9 CB SER A 2A 26.204 20.107 13.368 1.00 20.97 C··
ATOM 10 OG SER A 2A 27.450 19.508 13.038 1.00 25.16 O··
ATOM 11 N PRO A 1 23.223 20.197 14.441 1.00 12.21 N··
ATOM 12 CA PRO A 1 21.881 20.749 14.227 1.00 11.37 C··
ATOM 13 C PRO A 1 21.929 21.556 12.903 1.00 10.73 C··
ATOM 14 O PRO A 1 22.872 22.308 12.668 1.00 12.80 O··
ATOM 15 CB PRO A 1 21.641 21.649 15.437 1.00 10.87 C··
ATOM 16 CG PRO A 1 22.595 21.065 16.473 1.00 11.96 C··
ATOM 17 CD PRO A 1 23.844 20.738 15.680 1.00 12.56 C··
ATOM 18 N GLN A 2 20.920 21.358 12.090 1.00 10.01 N··
ATOM 19 CA GLN A 2 20.848 22.096 10.790 1.00 10.36 C··
ATOM 20 C GLN A 2 20.523 23.577 11.104 1.00 9.48 C··
ATOM 21 O GLN A 2 19.483 23.784 11.751 1.00 9.88 O··
ATOM 22 CB GLN A 2 19.839 21.439 9.860 1.00 9.46 C··
ATOM 23 CG GLN A 2 19.997 22.014 8.451 1.00 10.39 C··
ATOM 24 CD GLN A 2 19.124 21.359 7.433 1.00 11.31 C··
ATOM 25 OE1 GLN A 2 18.853 20.153 7.480 1.00 11.96 O··
ATOM 26 NE2 GLN A 2 18.609 22.130 6.468 1.00 10.45 N··
applying the latest (v.2.0.5) for pdb_delinsertion
, yields:
ATOM 1 N GLY A 1 25.689 14.213 13.354 1.00 30.45 N··
ATOM 2 CA GLY A 1 25.987 15.432 12.550 1.00 26.96 C··
ATOM 3 C GLY A 1 25.765 16.621 13.483 1.00 25.19 C··
ATOM 4 O GLY A 1 25.854 16.429 14.700 1.00 26.24 O··
ATOM 5 N SER A 2 25.502 17.796 12.939 1.00 23.20 N··
ATOM 6 CA SER A 2 25.239 18.997 13.759 1.00 18.18 C··
ATOM 7 C SER A 2 23.808 19.423 13.540 1.00 16.47 C··
ATOM 8 O SER A 2 23.127 19.047 12.551 1.00 16.08 O··
ATOM 9 CB SER A 2 26.204 20.107 13.368 1.00 20.97 C··
ATOM 10 OG SER A 2 27.450 19.508 13.038 1.00 25.16 O··
ATOM 11 N PRO A 2 23.223 20.197 14.441 1.00 12.21 N··
ATOM 12 CA PRO A 2 21.881 20.749 14.227 1.00 11.37 C··
ATOM 13 C PRO A 2 21.929 21.556 12.903 1.00 10.73 C··
ATOM 14 O PRO A 2 22.872 22.308 12.668 1.00 12.80 O··
ATOM 15 CB PRO A 2 21.641 21.649 15.437 1.00 10.87 C··
ATOM 16 CG PRO A 2 22.595 21.065 16.473 1.00 11.96 C··
ATOM 17 CD PRO A 2 23.844 20.738 15.680 1.00 12.56 C··
ATOM 18 N GLN A 4 20.920 21.358 12.090 1.00 10.01 N··
ATOM 19 CA GLN A 4 20.848 22.096 10.790 1.00 10.36 C··
ATOM 20 C GLN A 4 20.523 23.577 11.104 1.00 9.48 C··
ATOM 21 O GLN A 4 19.483 23.784 11.751 1.00 9.88 O··
ATOM 22 CB GLN A 4 19.839 21.439 9.860 1.00 9.46 C··
ATOM 23 CG GLN A 4 19.997 22.014 8.451 1.00 10.39 C··
ATOM 24 CD GLN A 4 19.124 21.359 7.433 1.00 11.31 C··
ATOM 25 OE1 GLN A 4 18.853 20.153 7.480 1.00 11.96 O··
ATOM 26 NE2 GLN A 4 18.609 22.130 6.468 1.00 10.45 N··
We cannot predict every single combination of mal-formatted PDB files out there in the wild... The PDB format stipulates that "ATOM records for proteins are listed from amino to carboxyl terminus" and that "The insertion code is used when two residues have the same numbering. The combination of residue numbering and insertion code defines the unique residue".
I think I understand the issue here, will work on it in the next few days.
By no means, I meant this was a bug from pdb_delinsertion
. I was also very surprised when I found this situation because that residue nomenclature is not expected. Now reading your comment, and thinking in more detail (yesterday I just reported without any deeper consideration), attention @JoaoRodrigues maybe we should NOT consider this case on pdb_delinsertion
. Because residue numbers are discontinuous, by adding this case to pdb_delinsertion
may cause other problems on correct and unconsidered cases. Therefore, as you well stated, if this example 1MH1
violates the PDB rules, it should not be considered for pdb-tools
.
What are your thoughts?