`_toRegularMolecule` failing with `convert_amber_dummies=True`
msuruzhon opened this issue ยท 15 comments
Hi,
I have observed a strange bug when executing the following code:
import BioSimSpace as BSS
# Generate the two ligands
ff = "openff_unconstrained-2.0.0"
mol0 = BSS.Parameters.parameterise("c1ccccc1", ff).getMolecule()
mol1 = BSS.Parameters.parameterise("c1ccccc1C", ff).getMolecule()
# Generate a merged mol
mapping = BSS.Align.matchAtoms(mol0, mol1, complete_rings_only=True)
mapping.pop(9) # So that we have a dummy in both endstates
merged_mol = BSS.Align.merge(mol0, mol1, mapping=mapping)
# This works
merged_mol._toRegularMolecule(is_lambda1=False, convert_amber_dummies=True)
# This fails?
merged_mol._toRegularMolecule(is_lambda1=True, convert_amber_dummies=True)
with the following traceback:
line 1699, in _toRegularMolecule
mol.atom(dummy.index())
Boost.Python.ArgumentError: Python argument types in
Selector_Atom_.index(Selector_Atom_)
did not match C++ signature:
index(SireMol::Selector<SireMol::Atom> {lvalue}, int i)
It seems that the dummy atom somehow doesn't have an .index()
attribute?
Any insights will be appreciated. Many thanks.
Hmm, in the first case if I print the type of the search result in the loop I see:
Atom( H4x:13 [ -1.81, 2.20, 0.01] ) <class 'sire.legacy.Mol._Mol.Atom'>
Atom( H6x:14 [ 2.57, -1.08, 0.08] ) <class 'sire.legacy.Mol._Mol.Atom'>
Atom( H7x:15 [ 2.47, -0.04, -1.36] ) <class 'sire.legacy.Mol._Mol.Atom'>
Atom( H8x:16 [ 2.71, 0.67, 0.33] ) <class 'sire.legacy.Mol._Mol.Atom'>
In the second case I get:
Atom( H4x:10 [ 2.15, -1.17, 0.23] )
) <class 'sire.legacy.Mol._Mol.Selector_Atom_'>
Somehow we're ending up with a Selector_Atom
rather than a regular Atom
. I'll try to figure out what's gone wrong.
The following diff seems to work:
diff --git a/python/BioSimSpace/_SireWrappers/_molecule.py b/python/BioSimSpace/_SireWrappers/_molecule.py
index 3e2175a0..de587196 100644
--- a/python/BioSimSpace/_SireWrappers/_molecule.py
+++ b/python/BioSimSpace/_SireWrappers/_molecule.py
@@ -1695,14 +1695,15 @@ class Molecule(_SireWrapper):
# Replace the ambertype.
for dummy in search:
+ index = dummy.atom().index()
mol = (
- mol.atom(dummy.index())
- .setProperty(amber_type, amber_types[dummy.index().value()])
+ mol.atom(index)
+ .setProperty(amber_type, amber_types[index.value()])
.molecule()
)
mol = (
- mol.atom(dummy.index())
- .setProperty(element, elements[dummy.index().value()])
+ mol.atom(index)
+ .setProperty(element, elements[index.value()])
.molecule()
)
I'd still like to figure out why a single atom search result is being returned with a different type, though.
Yes, it's true of any single-valued search result. For example:
import BioSimSpace as BSS
s = BSS.IO.readMolecules("amber/ala/*")
m = s[0]
# A multi-valued result.
for a in m._sire_object.search("atomname C"):
print(a, type(a))
Atom( C:5 [ 15.11, 14.79, 15.27] ) <class 'sire.legacy.Mol._Mol.Atom'>
Atom( C:15 [ 16.39, 18.28, 15.27] ) <class 'sire.legacy.Mol._Mol.Atom'>
# A single-valued result.
for a in m._sire_object.search("atomname CA"):
print(a, type(a))
Selector<SireMol::Atom>( size=1
0: Atom( CA:9 [ 16.53, 16.76, 15.27] )
) <class 'sire.legacy.Mol._Mol.Selector_Atom_'>
@chryswoods: Any thoughts on this and whether it's likely to cause any other issues that I've not thought of? The solution here is easy enough. For reference, I still have the same issue when calling .atoms()
on the search result, e.g. m._sire_object.search("atomname CA").atoms()
.
The search function returns the smallest class type for the result. So a multi-atom result will return a Selector_Atom_
while a single-atom result will return an Atom
.
You can control what you get by using the .atom()
and .atoms()
functions instead. .atom()
will always return a single atom, and requires that the search only matches a single atom. .atoms()
will always return a Selector_Atom_
and only requires that there is at least a single atom that matches.
For your case, it would be better to use the .atoms()
function as you are always wanting the Selector_Atom_
result, e.g.
import BioSimSpace as BSS
s = BSS.IO.readMolecules("amber/ala/*")
m = s[0]
# A multi-valued result.
for a in m._sire_object.atoms("atomname CA"):
print(a, type(a))
The surprising behaviour is that iterating over a single-value result returns a Selector_Atom_
, rather than a Atom
. I think this is because you are using the (old) .search()
function rather than using .atom()
, .atoms()
or the []
operator. For example,
>>> import sire as sr
>>> mols = sr.load(f"{sr.tutorial_url}/ala.top", f"{sr.tutorial_url}/ala.crd")
>>> mol = mols[0]
>>> mol.atoms("CA")
Selector<SireMol::Atom>( size=1
0: Atom( CA:9 [ 16.54, 5.03, 15.81] )
)
>>> for atom in mol["CA"]:
... print(atom)
Atom( CA:9 [ 16.54, 5.03, 15.81] )
>>> for atom in mol["C"]:
... print(atom)
Atom( C:5 [ 18.48, 4.55, 14.35] )
Atom( C:15 [ 15.37, 4.19, 16.43] )
I advise moving away from the .search()
function as it returns SearchResult
objects that haven't been processed to collapse object or post-process searches to remove surprises. You can control the mapping of searches by passing in the map with search term, e.g.
>>> mol[ ("CA", {"coordinates":"coordinates2"}) ]
(this hasn't been extended to the .atom()
or .atoms()
functions, and so should be added)
Thanks for the clarification, I've updated to the new search syntax as suggested. I'll check the rest of the code to see if there any other direct searches on Sire objects since it would be good to update those too. (In general I have used the wrapped functionality in BioSimSpace, since (at the time it was written) it was the easiest way to make use of a property map when searching.)
@lohedges It seems that when one calls mol["element Xx"]
and the molecule has no dummies one gets a segmentation fault. I am attaching files to reproduce this. How should one deal with this in this case then?
segfault.zip
Okay this one seems to always work fine when I try it?
try:
search = mol.atoms("element Xx")
except KeyError:
search = []
I discovered this when making the 2023.1.0 release and have fixed it there. It should work in the code for #417.
@lohedges As far as I can see the one there still uses search
, which I am also having problems with (because it can return different containers). Also, it seems that the non-sandpit one is still using ["element Xx"]
? I think we should change it to mol.atoms()
because this has been looking fine for me so far but I'd need to try more examples.
The segfault is definitely a bug in sire. One design goal of sire is that all out-of-range and other access is caught and turned into a meaningful exception. I view segfaults as bugs where an out-of-range hasn't been detected. I've checked and can reproduce this. The lldb output shows;
3:06pm :-> lldb python
(lldb) target create "python"
Current executable set to 'python' (arm64).
(lldb) run script.py
Process 73707 launched: '/Users/chris/mambaforge/envs/openbiosim/bin/python' (arm64)
INFO:numexpr.utils:Note: NumExpr detected 10 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
INFO:numexpr.utils:NumExpr defaulting to 8 threads.
Process 73707 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
frame #0: 0x0000000136b60c48 libSireMol.2023.1.2.dylib`SireMol::MoleculeView::atom() const + 24
libSireMol.2023.1.2.dylib`SireMol::MoleculeView::atom:
-> 0x136b60c48 <+24>: ldr x8, [x0]
0x136b60c4c <+28>: ldr x9, [x8, #0xc0]
0x136b60c50 <+32>: add x8, sp, #0x58
0x136b60c54 <+36>: blr x9
Target 0: (python) stopped.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
* frame #0: 0x0000000136b60c48 libSireMol.2023.1.2.dylib`SireMol::MoleculeView::atom() const + 24
frame #1: 0x0000000136b60a54 libSireMol.2023.1.2.dylib`SireMol::MoleculeView::atom(QString const&, SireBase::PropertyMap const&) const + 216
frame #2: 0x00000001463d8ea8 _Mol.2023.1.2.so`SireMol::Editor<SireMol::MolEditor, SireMol::Molecule>::atom(QString const&, SireBase::PropertyMap const&) + 28
frame #3: 0x00000001463da2e8 _Mol.2023.1.2.so`SireMol::Editor<SireMol::MolEditor, SireMol::Molecule>::operator[](QString const&) + 60
frame #4: 0x00000001463e10cc _Mol.2023.1.2.so`boost::python::detail::caller_arity<2u>::impl<SireBase::PropPtr<SireMol::MoleculeView> (SireMol::Editor<SireMol::MolEditor, SireMol::Molecule>::*)(QString const&), boost::python::default_call_policies, boost::mpl::vector3<SireBase::PropPtr<SireMol::MoleculeView>, SireMol::Editor<SireMol::MolEditor, SireMol::Molecule>&, QString const&> >::operator()(_object*, _object*) + 188
Somehow the returned view (MoleculeView
) on which the .atom()
function is called is causing a segfault. My guess is this is because the view is empty (no atoms matched the search term for dummies). Normally calling .atom()
on an empty molecule would raise a missing_atom
exception, e.g. (using the old API)
>>> import sire.legacy.Mol
>>> sire.legacy.Mol.Molecule().atom()
KeyError: 'SireMol::missing_atom: This view does not contain any atoms. (call sire.error.get_last_error_details() for more info)'
I'll see if I can find the cause. If it is quick enough, I will sneak this in for 2023.1.2
Thanks. I must have fixed it in the sandpit (where I was seeing the segfault, but not updated the regular code). Will do that now.
The issue I was having was that I was using the new search syntax, i.e. mol[...]
on a Editor
, not the original molecule, which was causing the issue that I was seeing. I'll switch them both to using mol.atoms
for now, then update when the search issue is resolved. (I needed to add a lot of try/except blocks for the new search throughout the code. Previously you'd just get no search results (if the query was valid) rather than an exception.)
As far as I can see the one there still uses search, which I am also having problems with (because it can return different containers).
Note that I call .atom()
on the result to get back an atom. I've switched to your suggested method on the OBS devel and am testing.
Fixed it - there was a bit of code that didn't check if the list of searches was empty. I've pushed a quick fix to devel
, and we now get an exception for mol.edit()["element Xx"]
rather than a segfault ;-)
It looks like the Windows jobs are working now too. It will take a while to build the dev packages so I can re-test. I will do the 2023.1.2
release tomorrow ;-)
This fix did make it in - I am release 2023.1.2
now. It should be uploaded to conda by this afternoon.