PDBeurope/arpeggio

Unable to run with openbabel 3.x

Closed this issue · 5 comments

Hello,

I've installed arpeggio on Windows 10 (Python37, OpenBabel-3.1.1-x64, openbabel-3.1.1-cp37-cp37m-win_amd64.whl) and when I run it against PDB I get this error.

Any ideas how can I fix this?

arpeggio 1kx2.pdb
INFO//13:55:55.225//Program begin.
WARNING//13:55:55.225//No selection was perceived. Defaults into full structure!!
DEBUG//13:55:55.250//Loaded PDB structure (BioPython)
c:\python37\lib\site-packages\openbabel_init_.py:27: UserWarning: "import openbabel" is deprecated, instead use "from openbabel import openbabel"
warnings.warn('"import openbabel" is deprecated, instead use "from openbabel import openbabel"')
DEBUG//13:56:04.177//Loaded PDB structure (OpenBabel)
DEBUG//13:56:04.177//Mapped OB to BioPython atoms and vice-versa.
DEBUG//13:56:04.178//Detected that the input structure contains hydrogens. Hydrogen addition will be skipped.
DEBUG//13:56:04.197//Determined atom explicit and implicit valences, bond orders, atomic numbers, formal charge and number of bound hydrogens.
Traceback (most recent call last):
File "C:\Python37\Scripts\arpeggio-script.py", line 11, in
load_entry_point('arpeggio==1.4.1', 'console_scripts', 'arpeggio')()
File "c:\python37\lib\site-packages\arpeggio\scripts\process_protein_cli.py", line 79, in main
run_arpeggio(args)
File "c:\python37\lib\site-packages\arpeggio\scripts\process_protein_cli.py", line 103, in run_arpeggio
i_complex.initialize()
File "c:\python37\lib\site-packages\arpeggio\core\interactions.py", line 302, in initialize
self._initialize_atom_sift()
File "c:\python37\lib\site-packages\arpeggio\core\interactions.py", line 1808, in _initialize_atom_sift
atom.potential_hbonds = atom.potential_hbonds + atom.num_hydrogens
AttributeError: 'Atom' object has no attribute 'num_hydrogens'

Hi @eusebiu ,

There are two things. One is that openbabel in their version 3.x made some breaking changes and the version you installed is 3.1.1. I was as not aware of this (when I was working on arpeggio refactoring 2.4.1 was available), but have made changes to get arpeggio working with openabel 3.x (now in dev branch)

the other one is the use of PDB files, I've never done any steps in supporting PDB files, as this is an obsolete format and we no longer use it internally. The PDB support in arpeggio is done purely on out-of-the-box basis using external parsers (biopython and openbabel) and I believe one of them is faulty when it comes to interpret this exact structure.

So I wonder if you could use mmcif files. When I download 1kx2 from our archive (https://www.ebi.ac.uk/pdbe/static/entry/1kx2_updated.cif) and run it without further arguments I get result:

(rdkit-env) ➜  arp arpeggio 1kx2_updated.cif 
INFO//12:28:47.596//Program begin.
WARNING//12:28:47.596//No selection was perceived. Defaults into full structure!!
DEBUG//12:28:47.632//Loaded PDB structure (BioPython)
==============================
*** Open Babel Warning  in PerceiveBondOrders
  Failed to kekulize aromatic bonds in OBMol::PerceiveBondOrders

DEBUG//12:28:47.660//Loaded MMCIF structure (OpenBabel)
DEBUG//12:28:47.664//Mapped OB to BioPython atoms and vice-versa.
DEBUG//12:28:47.664//Detected that the input structure contains hydrogens. Hydrogen addition will be skipped.
DEBUG//12:28:47.717//Determined atom explicit and implicit valences, bond orders, atomic numbers, formal charge and number of bound hydrogens.
DEBUG//12:28:47.728//Initialised SIFts.
DEBUG//12:28:47.729//Determined polypeptide residues, chain breaks, termini
DEBUG//12:28:47.744//Percieved and stored rings.
DEBUG//12:28:47.753//Perceived and stored amide groups.
DEBUG//12:28:47.762//Added hydrogens to BioPython atoms.
DEBUG//12:28:47.764//Added VdW radii.
DEBUG//12:28:47.767//Added covalent radii.
DEBUG//12:28:47.769//Completed NeighborSearch.
DEBUG//12:28:47.770//Assigned rings to residues.
DEBUG//12:28:47.771//Made selection.
DEBUG//12:28:47.868//Expanded to binding site.
DEBUG//12:28:47.869//Flagged selection rings.
DEBUG//12:28:47.870//Completed new NeighbourSearch.
INFO//12:29:09.207//Program End. Maximum memory usage was 63.93 MB.

Just please note that if you use the updated mmcif files we PDBe provide, the residue identifier that is used is not auth_asym_id, but pdbe_label_seq_id. Let me know should you have further questions.

Hey @lpravda
I tried installing from dev but I get this error:
pip install git+https://github.com/PDBeurope/arpeggio.git@dev#egg=arpeggio
Collecting arpeggio
Cloning https://github.com/PDBeurope/arpeggio.git (to revision dev) to c:\users\xxx\appdata\local\temp\pip-install-dodaix42\arpeggio
Running command git clone -q https://github.com/PDBeurope/arpeggio.git 'C:\Users\xxx\AppData\Local\Temp\pip-install-dodaix42\arpeggio'
Running command git checkout -b dev --track origin/dev
Branch 'dev' set up to track remote branch 'dev' from 'origin'.
Switched to a new branch 'dev'
ERROR: Could not find a version that satisfies the requirement pdbecif>2.0 (from arpeggio) (from versions: 1.3.4, 1.3.5, 1.3.7, 1.3.78455612, 1.4.0a0, 1.4.0, 1.4.2)
ERROR: No matching distribution found for pdbecif>2.0 (from arpeggio)

sorry my bad, can you try now?

arpeggio 1kx2_updated.cif works
arpeggio 1kx2.pdb does not - same error as above.

Thanks!

Yes, like I said, I believe there is a problem either in the PDB file or in the way the PDB file is processed by biopython/openbabel and presently there is not much I can do about it

I'm glad it works with the mmcif, please try to use this format whenever you can with arpeggio, as this is the only format we support.