Make Experiment objects picklable
vnmabus opened this issue · 18 comments
Experiment
objects should be picklable, so that one can create them in Python and then run them in parallel with multiprocessing/multinode utilities.
Currently, trying to pickle an Experiment gives me:
Can't pickle <function beat_interval_option at 0x7f7d0dfaf4c0>: it's not the same object as sacred.commandline_options.beat_interval_option
because the cli_option
decorator does not use functools.wraps
(see this for more info).
Hi @vnmabus! Making the Experiment
pickleable has been on my to-do list for a long time, and I would love to see this feature implemented! This is related to #441, where @Qwlouse mentioned that wrapt
is the reason that experiments (and other things in sacred) are not pickleable. Do you know a feasible solution that doesn't require modifying large parts of the experiment code?
BTW, the webpage you linked is broken / doesn't load.
Strange, the webpage loads for me. It is http://gael-varoquaux.info/programming/decoration-in-python-done-right-decorating-and-pickling.html from one of the lead scikit-learn developers.
Strange, the link you just posted doesn't work for me either. Are you sure your browser didn't cache it?
Strange, the link you just posted doesn't work for me either. Are you sure your browser didn't cache it?
I just tried in my phone, and it works as well as in my laptop.
@thequilo For me the link works. I think that is an issue with our University network. To summarize the webpage: Use a closure with functools.wraps
.
You are right, I was able to access the webpage outside of the University network.
The problem is by far not only the cli_option
decorator, but any decorator defined in sacred (@ex.capture
, @ex.config
). None of them uses functools.wraps
, some of them use the wrapt
package.
I looked into the source of pickle
, functools.wraps
and wrapt
and found the following:
pickle
treats functions as "global" objects, i.e., objects that don't need to be stored in the pickle file but can be imported while unpickling. This doesn't work when the decorator changes the type of the wrapped function (i.e., thecli_option
decorator returns aCLIOption
object) becausepickle
then tries to pickle the object instead of just writing the import path. While pickling the wrapper object it comes across the wrapped function that it tries to pickle with a "global import", but that import now refers to the wrapper object instead of the wrapped function, thus the error message "it's not the same object". I didn't come up with a good workaround for this yet.wrapt
wraps the function in aFunctionWrapper
type. This has the advantage that it works with any type of callable (functions,classmethod
s,staticmethod
s, bound methods, even classes). This type has similar issues as noted above, with the addition that it defines a not-implemented__reduce_ex__
.functools.wraps
sets the__qualname__
of the wrapper function to the same name as the wrapped function so that pickle's global import finds the correct object. This only works if the wrapper itself is a function, it doesn't work with theFunctionWrapper
returned bywrapt
.
So in conclusion: I don't know how to make the decorators pickleable, a simple functools.wraps
is, unfortunately, not the solution.
May I ask why wrapt
is needed at all?
That dependency was added before I started to work with sacred, so I don't know exactly why it was added, but it has a few nice properties and claims to "[go] way beyond existing mechanisms such as functools.wraps() to ensure that decorators preserve introspectability, signatures, type checking abilities etc. The decorators that can be constructed using this module will work in far more scenarios than typical decorators and provide more predictable and consistent behaviour" (source).
I think that main benefit for sacred is that it makes it possible to detect if a wrapped function is bound or not, so something like this is possible:
class A:
@ex.capture
def f(self, _config):
pass
@classmethod
@ex.capture
def f2(cls, _config):
pass
A.f2()
A().f()
Without wrapt
, we would have to guess whether a function/method is bound or not based on the parameter naming.
I'm not sure, though, if this functionality is actually required for sacred
.
I think that if you used @wraps
in all the decorators, you just have to follow the __wrapped__
objects to get the original function/method and inspect it. In fact some inspect functions such as signature
do it by default. Does wrapt
provide anything additional to this?
Not exactly,
import functools
def decorator(fn):
@functools.wraps(fn)
def wrapper():
return fn()
return wrapper
class A:
@decorator
def f(self):
pass
import inspect
inspect.ismethod(A().f.__wrapped__)
gives False
. To detect if f
is bound you have to make it a descriptor (what wrapt
does). The "original" method is not (yet) a method at the point where the decorator is applied. The wrapper beceomes a bound method (inspect.ismethod(A().f)
is True
). But that is not detectable (is it?) inside the wrapper function.
Ok, so for checking parameters you have to inspect the __wrapped__
function, which signature
already does by default, and for ismethod
we need to pass the decorated object.
Thus, both inspect.ismethod(A().f)
and inspect.signature(A().f)
should work. Does Sacred need anything else from wrapt
?
No, it only uses the wrapt.decorator
functionality. And I'm not even sure if the information on whether a captured function is bound is necessary for these decorators. So we may be able to remove the wrapt
dependency
I tried to remove wrapt
from the captured_function
and make it pickleable. Removing wrapt
was quite easy and the code looks (in my opinion) even cleaner. There is just one small problem: The configuration is assigned to an attribute of the function. This attribute is not pickled. So, everything works if you pickle the captured function and load it in the same process. But the configuration is suddenly gone if you load the captured function in another process. I don't know if there is a clean solution for this.
I found a hacky way to make captured functions pickleable and keep the config/run/... attributes. I'm really not sure if something like this should be done or whether it bears any severe problems. Anyway, I figured I'll share it so that others can comment on it:
import functools
class _PickleableFunctionProxy:
"""
This class is the key to making decorated functions pickleable when the
decorator returns an object.
It stores the wrapped function as a class variable and tells pickle
to serialize the stored reference instead of the object that is now at the
original function's location
"""
_stored_functions = {}
@classmethod
def _store_function(cls, fn):
fn_id = fn.__module__ + fn.__qualname__
cls._stored_functions[fn_id] = fn
return fn_id
@classmethod
def _load(cls, fn_id):
return cls(cls._stored_functions[fn_id])
def __init__(self, wrapped):
self.__wrapped__ = wrapped
self._fn_id = self._store_function(wrapped)
def __reduce_ex__(self, protocol):
return _PickleableFunctionProxy._load, (self._fn_id,)
def __call__(self, *args, **kwargs):
return self.__wrapped__(*args, **kwargs)
class CapturedFunction:
def __init__(self, wrapped, prefix=None):
self.logger = None
self.run = None
self.config = {}
self.rnd = None
self.prefix = prefix
functools.update_wrapper(self, wrapped)
self.__wrapped__ = _PickleableFunctionProxy(wrapped)
@property
def uses_randomness(self):
return "_seed" in self.arguments or "_rnd" in self.arguments
def __str__(self) -> str:
return self.__wrapped__.__wrapped__.__str__()
def __call__(self, *args, **kwargs):
# Captured function logic...
...
Are there any news on this? :)
I found a hacky way to make captured functions pickleable and keep the config/run/... attributes. I'm really not sure if something like this should be done or whether it bears any severe problems. Anyway, I figured I'll share it so that others can comment on it:
Why are you building a hacky register instead of just following the __wrapped__
chain?
That question is not so easy to answer (I myself had to think about it a bit) and could be a good starting point for a discussion.
Pickle treats functions as immutable singleton objects, meaning that it assumes that you get the same object when you import the function (by name) multiple times (also across interpreter instances). The problem with captured function is that they have a mutable state (the config
). There are multiple ways to make pickle swallow a captured function with different implications:
- Treat it as an immutable singleton object and mimic pickle's behavior when pickling functions.
- The config is not pickled but assumed to be present on the receiving side
- Only works when the receiving side calls
ex.run
to build and assign the config. This could create unexpected behavior when callingex.run
before pickling/unpickling (e.g., sender and receiver end up with different configs, receiver is uninitialized)
- Treat it as a stateful object. Then, the functions have to be stored at some publically accessible place (the hacky registry from my comment above)
- The config is transferred together with the captured function
- The configuration is unpickled even if
ex.run
is not called on the receiving side - Can have weird side effects, i.e., having two captured functions referring to the same (python) function with different configs
- A mixture: Treat the config scope as a "global variable" but transfer and overwrite the config. I think that this can produce totally broken behavior in certain cases
I'm not sure which one is best. There might even be alternatives. I currently prefer 1.
Would it be possible to use class callables instead of functions, so that pickle
works well?
I would really like this problem to be solved, as it would make my life way easier.