MCS mapping behavior
Opened this issue · 8 comments
The script
BioSimSpace_examples/scripts/test_mapping.py
generates a mapping for methane <--> methanol.
For efficient implementation in most MD packages the mapping code should be modified such that one H in methane maps to the oxygen atom of methanol.
Disabling the heavy atoms pre-match raises an exception on this example
Traceback (most recent call last):
File "../test_mapping.py", line 12, in
mapping = BSS.Align.matchAtoms(m0, m1, verbose=True, match_light=False)
File "/home/julien/sire.app/lib/python3.5/site-packages/BioSimSpace-2018.1.1+360.g41af51d.dirty-py3.5.egg/BioSimSpace/Align/init.py", line 188, in matchAtoms
File "/home/julien/sire.app/lib/python3.5/site-packages/BioSimSpace-2018.1.1+360.g41af51d.dirty-py3.5.egg/BioSimSpace/Align/init.py", line 377, in _score_rmsd
UserWarning: Exception 'SireError::program_bug' thrown by the thread 'master'.
Attempt to find the alignment matrix failed to produce a valid rotation matrix. Determinant should be 1. It is equal to 0.
Thrown from FILE: /home/julien/software/devel/Sire/corelib/src/libs/SireMaths/align.cpp, LINE: 508, FUNCTION: SireMaths::Matrix SireMaths::kabasch(const QVectorSireMaths::Vector&, const QVectorSireMaths::Vector&)
Replacing in findMCS.cpp all instances of
put(user_match_0, nvert0, -2);
by
put(user_match_0, nvert0, -1);
does not seem to change the output of
mapping = BSS.Align.matchAtoms(m0, m1, verbose=True)
Correction) findMCS.cpp was incorrectly compiled.
The above changes give the desired behaviour of matching one CH4 H to the CH3OH O
This is now fixed in the feature-mapping
branch of Sire, and in the latest commit in feature-align
of BioSimSpace.
The output of the merging could be improved by using the atom names of molecule0 in merged0 and of molecule1 in merged1.
This shows below the visualisation of aligned.pdb with vmd for the perturbation methane --> toluene. The carbon atom bonded to the methyl group is labelled.
This shows the visualisation of merged0.pdb coloured by the Element type. This is ok to visualise, and it is clear what the dummy atoms are.
This shows the visualisation of merged1.pdb
The atom labelled H has the correct element. The original label in mol1 was C but this has not been preserved in the mapping.
Ideally the desired behaviour (keeping atom names of non dummy atoms of mol0 in merged0, or atom names of non dummy atoms of mol1 in merged1) could be obtained with syntax like this:
map0 = {"name" : "name0",
"coordinates" : "coordinates0",
"charge" : "charge0",
"element" : "element0" }
map1 = {"name" : "name1",
"coordinates" : "coordinates1",
"charge" : "charge1",
"element" : "element1" }
merged = BSS.Align.merge(m0, m1, mapping=mapping)
BSS.IO.saveMolecules('merged0.pdb', merged,'PDB', map=map0)
BSS.IO.saveMolecules('merged1.pdb', merged, 'PDB', map=map1)
Unless I'm mistaken name is not currently used in the dictionary passed to BSS.IO.saveMolecule()
Hi Julien,
You can use map whatever properties you want. To see what's available, type:
merged._sire_molecule.propertyKeys()
The problem is that the atom names are not properties, rather Sire.Mol.AtomName
objects, so they cannot be changed using the map. (Each atom in the molecule as a single AtomName
, but multiple properties.) To get the name of an atom in the molecule you can do:
from Sire.Mol import AtomIdx
print(merged._sire_molecule.atom(AtomIdx(0)).name())
Because of this, the names aren't correctly preserved by the merge and are inconsistent at lamda = 0 and lambda = 1. This shouldn't be a problem for Gromacs / SOMD since the atoms are named according to the ambertype
property, i.e. ambertype0
and ambertype1
, but it makes it a pain for visualising a PDB with VMD since the PDB parser just uses the AtomName
.
I'll think about this a littler harder and see if I can come up with a solution. (And check that things are correct for the Gromacs case.)
Just a quick update. Gromacs and SOMD are okay since we only care about the names of the atoms at lambda = 0. The "atomtype" and "ambertype" properties are what's used by the internal programs for the purposes of performing the perturbation. (The names could be anything).
To make any other SireIO
parser able to write a merged molecule with consistent atom names at lambda = 1 I would need to make it aware of the additional merged molecule properties, i.e. extract the "atomname0" or "atomname1" property if the molecule is perturbable, rather using the default AtomName
. The user could pass an optional { "lambda" : "1" }
using the PropertyMap
if they wanted to write the end state. This would be easy enough to do as a one-off for visualisation purposes, e.g., using the PDB2
parser, but might be a pain to implement for everything else.
As a quick fix I've added two additional properties to the merged molecule that records the AtomName at lambda = 0 and lambda = 1. The PDB parser is aware of these properties, so you can add them to your map in order to recover the correct names at lambda = 1. (By default, the names at lambda = 0 are used.)
The following should now work, although you'll need to rebuild the latest feature-mapping
branch of Sire for the changes to take effect. (You'll also need to pull the latest changes from feature-align
of BioSimSpace.)
map0 = {"name" : "name0",
"coordinates" : "coordinates0",
"charge" : "charge0",
"element" : "element0" }
map1 = {"name" : "name1",
"coordinates" : "coordinates1",
"charge" : "charge1",
"element" : "element1" }
merged = BSS.Align.merge(m0, m1, mapping=mapping)
BSS.IO.saveMolecules('merged0.pdb', merged,'PDB', map=map0)
BSS.IO.saveMolecules('merged1.pdb', merged, 'PDB', map=map1)
When you get a chance, could you give this a try and let me know if you run into any issues.
Cheers.