PDBFixer API cannot fix mmCIF file
locitran opened this issue · 5 comments
Hi all,
I am trying to use PDBFixer to fix mmCIF file, but it turns problem when reading it.
from pdbfixer import PDBFixer
from openmm.app import PDBFile
def pdbfixer(in_path, out_path):
with open(in_path) as in_f:
fixer = PDBFixer(pdbfile=in_f)
fixer.findMissingResidues()
chains = list(fixer.topology.chains())
keys = fixer.missingResidues.keys()
for key in keys:
chain = chains[key[0]]
if key[1] == 0 or key[1] == len(list(chain.residues())):
del fixer.missingResidues[key]
fixer.findNonstandardResidues()
fixer.replaceNonstandardResidues()
fixer.removeHeterogens(keepWater=False)
fixer.findMissingAtoms()
fixer.addMissingAtoms()
with open(out_path, 'w') as out_f:
PDBFile.writeFile(fixer.topology, fixer.positions, out_f, keepIds=True)
in_file = './4p42-assembly1.cif'
out_file = 'fix4p42.pdb'
pdbfixer(in_file, out_file)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[3], line 28
26 in_file = '[./4p42-assembly1.cif](https://vscode-remote+ssh-002dremote-002b140-002e114-002e97-002e194.vscode-resource.vscode-cdn.net/mnt/Tsunami_HHD/newloci/NativeEnsembleWeb_copy/Rhapsody_project/scripts/4p42-assembly1.cif)'
27 out_file = 'fix4p42.pdb'
---> 28 pdbfixer(in_file, out_file)
Cell In[3], line 9, in pdbfixer(in_path, out_path)
7 def pdbfixer(in_path, out_path):
8 with open(in_path) as in_f:
----> 9 fixer = PDBFixer(pdbfile=in_f)
10 fixer.findMissingResidues()
11 chains = list(fixer.topology.chains())
File [/mnt/Tsunami_HHD/newloci/anaconda3/lib/python3.10/site-packages/pdbfixer/pdbfixer.py:251](https://vscode-remote+ssh-002dremote-002b140-002e114-002e97-002e194.vscode-resource.vscode-cdn.net/mnt/Tsunami_HHD/newloci/anaconda3/lib/python3.10/site-packages/pdbfixer/pdbfixer.py:251), in PDBFixer.__init__(self, filename, pdbfile, pdbxfile, url, pdbid)
248 file.close()
249 elif pdbfile:
250 # A file-like object has been specified.
--> 251 self._initializeFromPDB(pdbfile)
252 elif pdbxfile:
253 # A file-like object has been specified.
254 self._initializeFromPDBx(pdbxfile)
File [/mnt/Tsunami_HHD/newloci/anaconda3/lib/python3.10/site-packages/pdbfixer/pdbfixer.py:284](https://vscode-remote+ssh-002dremote-002b140-002e114-002e97-002e194.vscode-resource.vscode-cdn.net/mnt/Tsunami_HHD/newloci/anaconda3/lib/python3.10/site-packages/pdbfixer/pdbfixer.py:284), in PDBFixer._initializeFromPDB(self, file)
281 def _initializeFromPDB(self, file):
...
743 self.residue_name_with_spaces += possible_fourth_character
744 self.residue_name = self.residue_name_with_spaces.strip()
ValueError: Misaligned residue name: ATOM 1 N N . ASP A 1 3 ? -52.691 -92.622 29.836 1.00 58.49 ? ?
fixer = PDBFixer(pdbfile=in_f)
That needs to be pdbxfile=in_f
. You're telling it to parse the PDBx/mmCIF file as a PDB file.
Thank you, Peter. It's working now :-)
May I post another problem when modeling the N/C-terminus by PDBFixer?
As you can see there is a very long tail at N/C-terminus. I see your codes have a short energy minimization, it's supposed to be ok with addMissingResidues inside structures. However, it's obviously to say that the result of fixing terminal residues or long continuous missing residues may not be reasonable. What do you think?
Best regards,
It's common for proteins to have flexible tails. Because they don't have a fixed rigid conformation, they can't be resolved with crystallography and they're missing from crystal structures. PDBFixer is adding them stretched outward just because it's convenient, but don't take that literally. The whole point is that they're flexible and don't have a fixed conformation. As soon as you start simulating they'll begin moving around.
Sometimes people omit the tails from their simulations. You'll need to rely on your own biological knowledge to determine whether the tails are functionally important for your protein, or if they can be safely omitted.
Thank you Peter, I got the your idea
Ok, great. I'm closing this issue, since the question has been answered.