RDKit `Mol` doesn't preserve atom properties when pickled; impacts `ExplicitMoleculeComponent` and subclasses
Closed this issue · 2 comments
Although we typically serialize GufeTokenizable
s to dict
, keyed_dict
, or KeyedChain
forms turned to JSON, it can also be convenient to pickle these objects in certain cases. When using multiprocessing
, for example, GufeTokenizable
s passed as arguments to functions executed on other processes are typically pickled/unpickled as the serialization approach.
This presents an issue for ExplicitMoleculeComponent
s: RDKit Mol
objects do not by default include atom properties when pickled (rdkit/rdkit#6573 (comment)). This behavior causes issues especially for preserving partial charges, since we use RDKit Mol
properties for holding on to these.
It's possible to change this behavior globally for RDKit with:
from rdkit import Chem
Chem.SetDefaultPickleProperties(Chem.PropertyPickleOptions.AllProps)
Should we set this within gufe
so as to avoid this issue across the board, or will this have undesirable consequences?
Thanks for opening this @dotsdl ! That is indeed quite interesting, I've put it on the tracker for review.
I'll draft a PR for this, including a test illustrating the behavior we are trying to solution for.