OpenFreeEnergy/gufe

RDKit `Mol` doesn't preserve atom properties when pickled; impacts `ExplicitMoleculeComponent` and subclasses

Closed this issue · 2 comments

Although we typically serialize GufeTokenizables to dict, keyed_dict, or KeyedChain forms turned to JSON, it can also be convenient to pickle these objects in certain cases. When using multiprocessing, for example, GufeTokenizables passed as arguments to functions executed on other processes are typically pickled/unpickled as the serialization approach.

This presents an issue for ExplicitMoleculeComponents: RDKit Mol objects do not by default include atom properties when pickled (rdkit/rdkit#6573 (comment)). This behavior causes issues especially for preserving partial charges, since we use RDKit Mol properties for holding on to these.

It's possible to change this behavior globally for RDKit with:

from rdkit import Chem
Chem.SetDefaultPickleProperties(Chem.PropertyPickleOptions.AllProps)

Should we set this within gufe so as to avoid this issue across the board, or will this have undesirable consequences?

Thanks for opening this @dotsdl ! That is indeed quite interesting, I've put it on the tracker for review.

I'll draft a PR for this, including a test illustrating the behavior we are trying to solution for.