haddocking/pdb-tools

A rare case for pdb_delinsertion

Opened this issue · 2 comments

Today I found another curious case for pdb_delinsertion. I don't call it a bug but, instead, a case not considered yet.

PDB ID: 1MH1

In this pdb there are two residues at the beginning that belong to the purification tag and are assigned are insertions. The chain is continuous, there are no backbone breaks.

ATOM      1  N   GLY A   1A     25.689  14.213  13.354  1.00 30.45           N··  
ATOM      2  CA  GLY A   1A     25.987  15.432  12.550  1.00 26.96           C··  
ATOM      3  C   GLY A   1A     25.765  16.621  13.483  1.00 25.19           C··  
ATOM      4  O   GLY A   1A     25.854  16.429  14.700  1.00 26.24           O··  
ATOM      5  N   SER A   2A     25.502  17.796  12.939  1.00 23.20           N··  
ATOM      6  CA  SER A   2A     25.239  18.997  13.759  1.00 18.18           C··  
ATOM      7  C   SER A   2A     23.808  19.423  13.540  1.00 16.47           C··  
ATOM      8  O   SER A   2A     23.127  19.047  12.551  1.00 16.08           O··  
ATOM      9  CB  SER A   2A     26.204  20.107  13.368  1.00 20.97           C··  
ATOM     10  OG  SER A   2A     27.450  19.508  13.038  1.00 25.16           O··  
ATOM     11  N   PRO A   1      23.223  20.197  14.441  1.00 12.21           N··  
ATOM     12  CA  PRO A   1      21.881  20.749  14.227  1.00 11.37           C··  
ATOM     13  C   PRO A   1      21.929  21.556  12.903  1.00 10.73           C··  
ATOM     14  O   PRO A   1      22.872  22.308  12.668  1.00 12.80           O··  
ATOM     15  CB  PRO A   1      21.641  21.649  15.437  1.00 10.87           C··  
ATOM     16  CG  PRO A   1      22.595  21.065  16.473  1.00 11.96           C··  
ATOM     17  CD  PRO A   1      23.844  20.738  15.680  1.00 12.56           C··  
ATOM     18  N   GLN A   2      20.920  21.358  12.090  1.00 10.01           N··  
ATOM     19  CA  GLN A   2      20.848  22.096  10.790  1.00 10.36           C··  
ATOM     20  C   GLN A   2      20.523  23.577  11.104  1.00  9.48           C··  
ATOM     21  O   GLN A   2      19.483  23.784  11.751  1.00  9.88           O··  
ATOM     22  CB  GLN A   2      19.839  21.439   9.860  1.00  9.46           C··  
ATOM     23  CG  GLN A   2      19.997  22.014   8.451  1.00 10.39           C··  
ATOM     24  CD  GLN A   2      19.124  21.359   7.433  1.00 11.31           C··  
ATOM     25  OE1 GLN A   2      18.853  20.153   7.480  1.00 11.96           O··  
ATOM     26  NE2 GLN A   2      18.609  22.130   6.468  1.00 10.45           N··  

applying the latest (v.2.0.5) for pdb_delinsertion, yields:

ATOM      1  N   GLY A   1      25.689  14.213  13.354  1.00 30.45           N··  
ATOM      2  CA  GLY A   1      25.987  15.432  12.550  1.00 26.96           C··  
ATOM      3  C   GLY A   1      25.765  16.621  13.483  1.00 25.19           C··  
ATOM      4  O   GLY A   1      25.854  16.429  14.700  1.00 26.24           O··  
ATOM      5  N   SER A   2      25.502  17.796  12.939  1.00 23.20           N··  
ATOM      6  CA  SER A   2      25.239  18.997  13.759  1.00 18.18           C··  
ATOM      7  C   SER A   2      23.808  19.423  13.540  1.00 16.47           C··  
ATOM      8  O   SER A   2      23.127  19.047  12.551  1.00 16.08           O··  
ATOM      9  CB  SER A   2      26.204  20.107  13.368  1.00 20.97           C··  
ATOM     10  OG  SER A   2      27.450  19.508  13.038  1.00 25.16           O··  
ATOM     11  N   PRO A   2      23.223  20.197  14.441  1.00 12.21           N··  
ATOM     12  CA  PRO A   2      21.881  20.749  14.227  1.00 11.37           C··  
ATOM     13  C   PRO A   2      21.929  21.556  12.903  1.00 10.73           C··  
ATOM     14  O   PRO A   2      22.872  22.308  12.668  1.00 12.80           O··  
ATOM     15  CB  PRO A   2      21.641  21.649  15.437  1.00 10.87           C··  
ATOM     16  CG  PRO A   2      22.595  21.065  16.473  1.00 11.96           C··  
ATOM     17  CD  PRO A   2      23.844  20.738  15.680  1.00 12.56           C··  
ATOM     18  N   GLN A   4      20.920  21.358  12.090  1.00 10.01           N··  
ATOM     19  CA  GLN A   4      20.848  22.096  10.790  1.00 10.36           C··  
ATOM     20  C   GLN A   4      20.523  23.577  11.104  1.00  9.48           C··  
ATOM     21  O   GLN A   4      19.483  23.784  11.751  1.00  9.88           O··  
ATOM     22  CB  GLN A   4      19.839  21.439   9.860  1.00  9.46           C··  
ATOM     23  CG  GLN A   4      19.997  22.014   8.451  1.00 10.39           C··  
ATOM     24  CD  GLN A   4      19.124  21.359   7.433  1.00 11.31           C··  
ATOM     25  OE1 GLN A   4      18.853  20.153   7.480  1.00 11.96           O··  
ATOM     26  NE2 GLN A   4      18.609  22.130   6.468  1.00 10.45           N··  

We cannot predict every single combination of mal-formatted PDB files out there in the wild... The PDB format stipulates that "ATOM records for proteins are listed from amino to carboxyl terminus" and that "The insertion code is used when two residues have the same numbering. The combination of residue numbering and insertion code defines the unique residue".

I think I understand the issue here, will work on it in the next few days.

By no means, I meant this was a bug from pdb_delinsertion. I was also very surprised when I found this situation because that residue nomenclature is not expected. Now reading your comment, and thinking in more detail (yesterday I just reported without any deeper consideration), attention @JoaoRodrigues maybe we should NOT consider this case on pdb_delinsertion. Because residue numbers are discontinuous, by adding this case to pdb_delinsertion may cause other problems on correct and unconsidered cases. Therefore, as you well stated, if this example 1MH1 violates the PDB rules, it should not be considered for pdb-tools.

What are your thoughts?