IDSIA/sacred

Make Experiment objects picklable

vnmabus opened this issue · 18 comments

Experiment objects should be picklable, so that one can create them in Python and then run them in parallel with multiprocessing/multinode utilities.

Currently, trying to pickle an Experiment gives me:

Can't pickle <function beat_interval_option at 0x7f7d0dfaf4c0>: it's not the same object as sacred.commandline_options.beat_interval_option

because the cli_option decorator does not use functools.wraps (see this for more info).

Hi @vnmabus! Making the Experiment pickleable has been on my to-do list for a long time, and I would love to see this feature implemented! This is related to #441, where @Qwlouse mentioned that wrapt is the reason that experiments (and other things in sacred) are not pickleable. Do you know a feasible solution that doesn't require modifying large parts of the experiment code?

BTW, the webpage you linked is broken / doesn't load.

Strange, the webpage loads for me. It is http://gael-varoquaux.info/programming/decoration-in-python-done-right-decorating-and-pickling.html from one of the lead scikit-learn developers.

Strange, the link you just posted doesn't work for me either. Are you sure your browser didn't cache it?

Strange, the link you just posted doesn't work for me either. Are you sure your browser didn't cache it?

I just tried in my phone, and it works as well as in my laptop.

@thequilo For me the link works. I think that is an issue with our University network. To summarize the webpage: Use a closure with functools.wraps.

You are right, I was able to access the webpage outside of the University network.

The problem is by far not only the cli_option decorator, but any decorator defined in sacred (@ex.capture, @ex.config). None of them uses functools.wraps, some of them use the wrapt package.

I looked into the source of pickle, functools.wraps and wrapt and found the following:

  • pickle treats functions as "global" objects, i.e., objects that don't need to be stored in the pickle file but can be imported while unpickling. This doesn't work when the decorator changes the type of the wrapped function (i.e., the cli_option decorator returns a CLIOption object) because pickle then tries to pickle the object instead of just writing the import path. While pickling the wrapper object it comes across the wrapped function that it tries to pickle with a "global import", but that import now refers to the wrapper object instead of the wrapped function, thus the error message "it's not the same object". I didn't come up with a good workaround for this yet.
  • wrapt wraps the function in a FunctionWrapper type. This has the advantage that it works with any type of callable (functions, classmethods, staticmethods, bound methods, even classes). This type has similar issues as noted above, with the addition that it defines a not-implemented __reduce_ex__.
  • functools.wraps sets the __qualname__ of the wrapper function to the same name as the wrapped function so that pickle's global import finds the correct object. This only works if the wrapper itself is a function, it doesn't work with the FunctionWrapper returned by wrapt.

So in conclusion: I don't know how to make the decorators pickleable, a simple functools.wraps is, unfortunately, not the solution.

May I ask why wrapt is needed at all?

That dependency was added before I started to work with sacred, so I don't know exactly why it was added, but it has a few nice properties and claims to "[go] way beyond existing mechanisms such as functools.wraps() to ensure that decorators preserve introspectability, signatures, type checking abilities etc. The decorators that can be constructed using this module will work in far more scenarios than typical decorators and provide more predictable and consistent behaviour" (source).

I think that main benefit for sacred is that it makes it possible to detect if a wrapped function is bound or not, so something like this is possible:

class A:
    @ex.capture
    def f(self, _config):
        pass

    @classmethod
    @ex.capture
    def f2(cls, _config):
        pass
        
A.f2()
A().f()

Without wrapt, we would have to guess whether a function/method is bound or not based on the parameter naming.

I'm not sure, though, if this functionality is actually required for sacred.

I think that if you used @wraps in all the decorators, you just have to follow the __wrapped__ objects to get the original function/method and inspect it. In fact some inspect functions such as signature do it by default. Does wrapt provide anything additional to this?

Not exactly,

import functools
def decorator(fn):
    @functools.wraps(fn)
    def wrapper():
        return fn()
    return wrapper

class A:
    @decorator
    def f(self):
        pass
    
import inspect
inspect.ismethod(A().f.__wrapped__)

gives False. To detect if f is bound you have to make it a descriptor (what wrapt does). The "original" method is not (yet) a method at the point where the decorator is applied. The wrapper beceomes a bound method (inspect.ismethod(A().f) is True). But that is not detectable (is it?) inside the wrapper function.

Ok, so for checking parameters you have to inspect the __wrapped__ function, which signature already does by default, and for ismethod we need to pass the decorated object.

Thus, both inspect.ismethod(A().f) and inspect.signature(A().f) should work. Does Sacred need anything else from wrapt?

No, it only uses the wrapt.decorator functionality. And I'm not even sure if the information on whether a captured function is bound is necessary for these decorators. So we may be able to remove the wrapt dependency

I tried to remove wrapt from the captured_function and make it pickleable. Removing wrapt was quite easy and the code looks (in my opinion) even cleaner. There is just one small problem: The configuration is assigned to an attribute of the function. This attribute is not pickled. So, everything works if you pickle the captured function and load it in the same process. But the configuration is suddenly gone if you load the captured function in another process. I don't know if there is a clean solution for this.

I found a hacky way to make captured functions pickleable and keep the config/run/... attributes. I'm really not sure if something like this should be done or whether it bears any severe problems. Anyway, I figured I'll share it so that others can comment on it:

import functools


class _PickleableFunctionProxy:
    """
    This class is the key to making decorated functions pickleable when the
    decorator returns an object.
    It stores the wrapped function as a class variable and tells pickle
    to serialize the stored reference instead of the object that is now at the
    original function's location
    """
    _stored_functions = {}

    @classmethod
    def _store_function(cls, fn):
        fn_id = fn.__module__ + fn.__qualname__
        cls._stored_functions[fn_id] = fn
        return fn_id

    @classmethod
    def _load(cls, fn_id):
        return cls(cls._stored_functions[fn_id])

    def __init__(self, wrapped):
        self.__wrapped__ = wrapped
        self._fn_id = self._store_function(wrapped)

    def __reduce_ex__(self, protocol):
        return _PickleableFunctionProxy._load, (self._fn_id,)

    def __call__(self, *args, **kwargs):
        return self.__wrapped__(*args, **kwargs)


class CapturedFunction:
    def __init__(self, wrapped, prefix=None):
        self.logger = None
        self.run = None
        self.config = {}
        self.rnd = None
        self.prefix = prefix
        functools.update_wrapper(self, wrapped)
        self.__wrapped__ = _PickleableFunctionProxy(wrapped)

    @property
    def uses_randomness(self):
        return "_seed" in self.arguments or "_rnd" in self.arguments

    def __str__(self) -> str:
        return self.__wrapped__.__wrapped__.__str__()

    def __call__(self, *args, **kwargs):
        # Captured function logic...
        ...

Are there any news on this? :)

I found a hacky way to make captured functions pickleable and keep the config/run/... attributes. I'm really not sure if something like this should be done or whether it bears any severe problems. Anyway, I figured I'll share it so that others can comment on it:

Why are you building a hacky register instead of just following the __wrapped__ chain?

That question is not so easy to answer (I myself had to think about it a bit) and could be a good starting point for a discussion.

Pickle treats functions as immutable singleton objects, meaning that it assumes that you get the same object when you import the function (by name) multiple times (also across interpreter instances). The problem with captured function is that they have a mutable state (the config). There are multiple ways to make pickle swallow a captured function with different implications:

  1. Treat it as an immutable singleton object and mimic pickle's behavior when pickling functions.
  • The config is not pickled but assumed to be present on the receiving side
  • Only works when the receiving side calls ex.run to build and assign the config. This could create unexpected behavior when calling ex.run before pickling/unpickling (e.g., sender and receiver end up with different configs, receiver is uninitialized)
  1. Treat it as a stateful object. Then, the functions have to be stored at some publically accessible place (the hacky registry from my comment above)
  • The config is transferred together with the captured function
  • The configuration is unpickled even if ex.run is not called on the receiving side
  • Can have weird side effects, i.e., having two captured functions referring to the same (python) function with different configs
  1. A mixture: Treat the config scope as a "global variable" but transfer and overwrite the config. I think that this can produce totally broken behavior in certain cases

I'm not sure which one is best. There might even be alternatives. I currently prefer 1.

Would it be possible to use class callables instead of functions, so that pickle works well?

I would really like this problem to be solved, as it would make my life way easier.