PDBeurope/arpeggio

Open Babel Warning in PerceiveBondOrders

Bae-SungHan opened this issue · 7 comments

Hello
I have calculated protein-ligand interactions for several protein files with arpeggio and analyzed results.
The interaction calculations were completed successfully, but calculated interactions rarely included atom-plane interactions.
This is quite different results from the results shown on the PDBE website for same protein-ligand pairs.

In most cases, the following warning message occurred while executing arpeggio.
*** Open Babel Warning in PerceiveBondOrders
Failed to kekulize aromatic bonds in OBMol::PerceiveBondOrders

I think this is why my run failed to calculate atom-plane interactions.

The results I ran previously were the results of directly running pdbe-arpeggio with .mmcif files downloaded from the RCSB PDB website without any additional preprocessing.
So I was thinking that I need a separate preprocessing step to solve the above problem, but I have no idea what to do.

I would like to ask about the preprocessing process to solve my problem.
Below is the arpeggio command line I executed in the terminal and the displayed log.
The version of modules are pdbe-arpeggio==1.4.2 / biopython==1.81 / gemmi==0.6.3 / openbabel==3.1.1
Thank you

pdbe-arpeggio /workdir_efs/bshzz1006/arpeggio/sample_data/cif_file/7ukv.cif -s /A/1101/ -wh -o /workdir_efs/bshzz1006/arpeggio/sample_data/result/7UKV
INFO//10:19:49.953//Program begin.
INFO//10:19:49.954//Selection perceived: ['/A/1101/']
DEBUG//10:19:49.999//Loaded PDB structure (BioPython)
==============================
*** Open Babel Warning in PerceiveBondOrders
Failed to kekulize aromatic bonds in OBMol::PerceiveBondOrders

DEBUG//10:19:50.047//Loaded MMCIF structure (OpenBabel)
DEBUG//10:19:50.061//Mapped OB to BioPython atoms and vice-versa.
DEBUG//10:19:50.078//Added hydrogens.
DEBUG//10:19:50.321//Wrote hydrogenated structure file. Hydrogenation was by Arpeggio using OpenBabel defaults.
DEBUG//10:19:50.491//Determined atom explicit and implicit valences, bond orders, atomic numbers, formal charge and number of bound hydrogens.
DEBUG//10:19:50.508//Initialised SIFts.
DEBUG//10:19:50.513//Determined polypeptide residues, chain breaks, termini
DEBUG//10:19:50.513//Percieved and stored rings.
DEBUG//10:19:50.515//Perceived and stored amide groups.
DEBUG//10:19:50.519//Added hydrogens to BioPython atoms.
DEBUG//10:19:50.522//Added VdW radii.
DEBUG//10:19:50.526//Added covalent radii.
DEBUG//10:19:50.530//Completed NeighborSearch.
DEBUG//10:19:50.530//Assigned rings to residues.
DEBUG//10:19:50.535//Made selection.
DEBUG//10:19:50.611//Expanded to binding site.
DEBUG//10:19:50.611//Flagged selection rings.
DEBUG//10:19:50.611//Completed new NeighbourSearch.
INFO//10:19:50.755//Program End. Maximum memory usage was 92.66 MB.

I also experience this issue. It is leading to major differences between running a pdb through pdbe-arepeggio and the data in pdbe-graph. For example, plane-plane contacts for 1b42 are not calculated, but are described in the pdbe-graph version. Can you clarify if internally there is a processing step you are performing to enable the calculation of these atom-plane and plane-plane interactions?

protein-ligand interactions available from pdbe-graph API is not directly calculated on an asymmetric unit of a PDB entry. There are a number of preprocessing steps performed before calculating interactions using pdbe-arpeggio.

  1. Considers only Model 1 if there are multiple models for a PDB entry
  2. Removes alternate conformers
  3. Generates biological assembly using model-server
  4. Protonate strcuture using chimerax
  5. Calculate interactions using pdbe-arpeggio
  6. Filter interactions to include only "INTER", "INTRA_SELECTION", "SELECTION_WATER"

At PDBe we have created a pipeline, which a wrapper of pdbe-arpeggio to calculate intereactions of ligands released for all PDB entries weekly. We will be soon releasing this pipeline in GitHub. Hope it clarifies you queries.

Hi @roshkjr

Thanks for your feedback, this is very helpful. Can you confirm if your pdbe-arpeggio wrapper mitigates the issue @Bae-SungHan was originally describing and which I am also experiencing in which INTER contacts of type plane-plane/atom-plane are not returned due to an OpenBabel issue with kekulizing aromatic bonds?

If there is a method in your pipeline which you could share which means the OpenBabel kekulization succeeds, this would be useful to share in the pdbe-arpeggio readme/implement into pdbe-arpeggio itself, as contacts of type atom-plane/plane-plane are not returned in my experience.

Thanks again,

Matt

Hi @m-crown,

No, pdbe-arpeggio wrapper does not mitigate the keulization issue, but I don't think the missing atom-plane/plane-plane contacts are due to this issue. The warning

*** Open Babel Warning in PerceiveBondOrders

Failed to kekulize aromatic bonds in [OBMol::PerceiveBondOrders]

started appearing when we updated OpenBabel (OB) 2.4.1 to 3.0. Even though I tried to resolve this issue, but never managed to get around it. However before updating to OB 3, I did compare the results generated by OB 2.4.1 and OB 3 for the entire PDB archive, and there was not significant differences.

For the particular example @Bae-SungHan mentioned, when I ran pdbe-arpeggio using the protonated biological assembly atom-plane and plane-plane contacts were calculated.

pdbe-arpeggio 7ukv_bio_h.cif -s /A/1101/

If you have been seeing missing of atom-plane or plane contacts for other PDB entries, can you please try running pdbe-arpeggio using the protonated biological assembly of these entries again? You can get the protonated biological assembly of pdb entries at https://www.ebi.ac.uk/pdbe/model-server/v1/<pdb_id>/full?encoding=cif&data_source=pdb-h

Please let me know if this resolves the issue or not. Thanks

Thanks again for detailed troubleshooting steps. I am working with a particular PDB, 1b42, as an example of this issue and in trying to figure out where the differences are arising.

PDBe-graph and the API return me contacts for /A/400/ that include plane-plane interactions, specifically, I can see this contact both directly in the neo4j graph and in the API result:

{"ligand_atoms":["C4","C5","C8","N7","N9"],"end":{"chain_id":"A","author_residue_number":115,"chem_comp_id":"PHE","atom_names":["CD1","CD2","CE1","CE2","CG","CZ"],"author_insertion_code":" "},"interaction_type":"plane-plane","interaction_details":["FE","EF"],"distance":4.97}

The specific URL for API query is here.

I then follow the process you described to run arpeggio on the protonated biological assembly:

wget "https://www.ebi.ac.uk/pdbe/model-server/v1/1b42/full?encoding=cif&data_source=pdb-h" -O 1b42_bio_h.cif
pdbe-arpeggio 1b42_bio_h.cif -s /A/400/

A grep of the entire json file for plane-plane interactions returns nothing (same for atom-plane, which are also included in the PDBe API result).

I am using version pdbe-arpeggio v1.4.4

Hi @m-crown
When I ran the same code below:

wget "https://www.ebi.ac.uk/pdbe/model-server/v1/1b42/full?encoding=cif&data_source=pdb-h" -O 1b42_bio_h.cif
pdbe-arpeggio 1b42_bio_h.cif -s /A/400/

I can see plane-plane and atom-plane interactions in the generated 1b42_bio_h.json file. I am also using pdbe-arpeggio v1.4.4. Can you please check again?

The combination of nuking my conda env and using the protonated structure has fixed this for me. No clue which package was causing the issue as all were up to date that I could see. Thank you for troubleshooting, and look forward to seeing the pipeline when it is available!

I would suggest the original issue could potentially be closed now if you have verified that the issue is not present for the original structures too?