Arpeggio fails when chem_comp.name == "."
m-crown opened this issue · 1 comments
Arpeggio processes chem_comp names using Gemmi. When Gemmi encounters a value of "." it evaluates this as False, and "?" as None.
In for example 2c16
the chem_comp name for all chemical components is "." - and when False is evaluate with .upper() in the get_component_types function, an error is raised (see bottom).
A potential solution to this problem would be to evaluate the presence of water using :
if chem_comp['id'][i].upper() == 'HOH'
instead of chem_comp['name'][i].upper() == 'WATER'
as the id column is more likely to be stable than the name column
If you think this is a suitable workaround, let me know and I can make a PR.
INFO//00:16:26.032//Program begin. [5/94114]
INFO//00:16:26.047//Selection perceived: ['/A/1336/', '/A_2/1336/', '/A_3/1336/', '/A_4/1336/', '/A/1337/', '/A_2/1337/', '/A_3/1337/', '/A_4/1337/', '/A/1338/', '/A_2/1338/', '/A_3/1338/', '/A[4/94114]
, '/A/1339/', '/A_2/1339/', '/A_3/1339/', '/A_4/1339/', '/B/1336/', '/B_2/1336/', '/B_3/1336/', '/B_4/1336/', '/B/1337/', '/B_2/1337/', '/B_3/1337/', '/B_4/1337/', '/B/1338/', '/B_2/1338/', '/B_3[3/94114]
'/B_4/1338/', '/B/1339/', '/B_2/1339/', '/B_3/1339/', '/B_4/1339/'] [2/94114]
DEBUG//00:16:26.879//Loaded PDB structure (BioPython) [1/94114]
==============================
*** Open Babel Warning in PerceiveBondOrders
Failed to kekulize aromatic bonds in OBMol::PerceiveBondOrders
DEBUG//00:16:39.395//Loaded MMCIF structure (OpenBabel)
Traceback (most recent call last):
File "/home/crk_w20032671/ProCogGraph/nextflow/work/conda/arpeggio-env-dbdf2f54a83bf8f7e2494af81181aee8/bin/pdbe-arpeggio", line 8, in <module>
sys.exit(main())
File "/home/crk_w20032671/ProCogGraph/nextflow/work/conda/arpeggio-env-dbdf2f54a83bf8f7e2494af81181aee8/lib/python3.9/site-packages/arpeggio/scripts/process_protein_cli.py", line 150, in main
run_arpeggio(args)
File "/home/crk_w20032671/ProCogGraph/nextflow/work/conda/arpeggio-env-dbdf2f54a83bf8f7e2494af81181aee8/lib/python3.9/site-packages/arpeggio/scripts/process_protein_cli.py", line 159, in run_arpeggio
i_complex = InteractionComplex(
File "/home/crk_w20032671/ProCogGraph/nextflow/work/conda/arpeggio-env-dbdf2f54a83bf8f7e2494af81181aee8/lib/python3.9/site-packages/arpeggio/core/interactions.py", line 70, in __init__
self.component_types = protein_reader.get_component_types(filename)
File "/home/crk_w20032671/ProCogGraph/nextflow/work/conda/arpeggio-env-dbdf2f54a83bf8f7e2494af81181aee8/lib/python3.9/site-packages/arpeggio/core/protein_reader.py", line 431, in get_component_types
if chem_comp['name'][i].upper() == 'WATER':
AttributeError: 'bool' object has no attribute 'upper'```
We got hit by this issue as well (in our case chem_comp.name was missing altogether in the file).
The chem_comp.name data item is not mandatory free text. It seems unsafe to rely on it.
In addition the commit also recognizes DOD as water (which wouldn't have been the case previously).