uqfoundation/dill

Recursion error on classes using __getattr__

Closed this issue · 5 comments

The below MWE is extracted from qiskit, i.e. this is a MWE for why qiskit.Aer cannot be serialized.

import dill

class AerWrapper:
    """Lazy loading wrapper for Aer provider."""

    def __init__(self):
        self.aer = None

    def __getattr__(self, attr):
        if not self.aer:
            self.aer = "Hi"
        return getattr(self.aer, attr)

Aer = AerWrapper()

def func(): # Hide Aer object in function, to justify recurse=True
    print(Aer)

with open("temp.dill", 'wb') as f:
    dill.dump(func, f, recurse=True)

del Aer

with open("temp.dill", "rb") as f:
    g = dill.load(f)

Executing this in any of the latest dill versions (0.3.5-0.3.6) leads to a RecursionError.

Traceback (most recent call last):
  File "test.py", line 27, in <module>
    g = dill.load(f)
  File ".../python3.8/site-packages/dill/_dill.py", line 272, in load
    return Unpickler(file, ignore=ignore, **kwds).load()
  File ".../python3.8/site-packages/dill/_dill.py", line 419, in load
    obj = StockUnpickler.load(self)
  File "test.py", line 12, in __getattr__
    if not self.aer:
  File "test.py", line 12, in __getattr__
    if not self.aer:
  File "test.py", line 12, in __getattr__
    if not self.aer:
  [Previous line repeated 993 more times]
RecursionError: maximum recursion depth exceeded

This might be related to Issues #362 and #405, but I'm not sure, so I created a new issue.

I can confirm this behavior for:

Python 3.7.15 (default, Oct 12 2022, 04:11:53) 
[Clang 10.0.1 (clang-1001.0.46.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> dill.__version__
'0.3.7.dev0'

both with ignore=True and ignore=False on load.

As an aside, if a __reduce__ method (or similar) is added to qiskit.Aer, it should be serializable.

Python 3.7.15 (default, Oct 12 2022, 04:11:53) 
[Clang 10.0.1 (clang-1001.0.46.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> class AerWrapper:
...     """Lazy loading wrapper for Aer provider."""
...     def __init__(self):
...         self.aer = None
...     def __getattr__(self, attr):
...         if not self.aer:
...             self.aer = "Hi"
...         return getattr(self.aer, attr)
... 
>>> Aer = AerWrapper()
>>> 
>>> with open("temp.dill", 'wb') as f:
...     dill.dump(Aer, f, recurse=True)
... 
>>> with open("temp2.dill", 'wb') as f:
...     dill.dump(Aer, f, recurse=False)
... 
>>> del Aer
>>> 
>>> with open("temp.dill", 'rb') as f:
...     Aer = dill.load(f)
... 
Traceback (most recent call last):
...[snip]...

actually, just trying to serialize Aer yields this error. Regardless of the setting on ignore or recurse, and regardless of whether AerWrapper is deleted or not.

@Sola85 This is something that standard pickle cannot handle either, yielding the same error.

import pickle

class AerWrapper:
    """Lazy loading wrapper for Aer provider."""
    
    def __init__(self):
        self.aer = None
    
    def __getattr__(self, attr):
        if not self.aer:
            self.aer = "Hi"
        return getattr(self.aer, attr)

Aer = AerWrapper()
Aer_copy = pickle.loads(pickle.dumps(Aer))

The problem is that, because you redefined __getattr__, the unpickler cannot find the __setstate__ function that it needs to reconstruct the object. Thankfully, this is easy to fix. Add a __getstate__ and __setstate__ function to reconstruct the object.

import pickle

class AerWrapper:
    """Lazy loading wrapper for Aer provider."""
    __slots__ = ('aer',) # back this object with a tuple instead of a dict
                         # for efficiency. This can only be done if all attributes
                         # of `AerWrapper` are known ahead of time, which
                         # seems like the case in this code snippet
    
    def __init__(self):
        self.aer = None
    
    def __getattr__(self, attr):
        if self.aer is not None: # If `self.aer` is falsy, what you had before
                                 # would set `self.aer` again, which may be
                                 # undesirable
            self.aer = "Hi"
        return getattr(self.aer, attr)
    
    def __getstate__(self):
        # We need to wrap the `aer` object in a tuple here, because pickle
        # treats a state of `None` as no state, and thus, doesn't call
        # `__setstate__`. Play around with this and it will become clear what
        # I mean. You can look at this StackOverflow post for more information
        # on how to use pickle states:
        # https://stackoverflow.com/questions/1939058/simple-example-of-use-of-setstate-and-getstate
        return (self.aer,)
    
    def __setstate__(self, state):
        (self.aer,) = state

Aer = AerWrapper()
Aer_copy = pickle.loads(pickle.dumps(Aer))

If you don't intend to serialize the aer object (which it seems like that you are trying to do if you are creating the object in the __getattr__ function instead of in __init__) and just create it lazily when it is needed, then this is sufficient:

class AerWrapper:
    """Lazy loading wrapper for Aer provider."""
    __slots__ = ('aer',)
    
    def __init__(self):
        self.aer = None
    
    def __getattr__(self, attr):
        if self.aer is not None:
            self.aer = "Hi"
        return getattr(self.aer, attr)
    
    def __getstate__(self):
        return ()
    
    def __setstate__(self, state):
        self.aer = None

Also, in Python, private variables are written with double underscore prefixes, like __aer. If you intend for aer not be used outside of the wrapper class, I would suggest doing that. Python will mangle the names of private variables with the name of the class to reduce name collisions, which is an added bonus.

In summary, pickling doesn't create objects in the normal way that would happen if you just did AerWrapper(). Instead of calling, __init__, the unpickler calls __setstate__. If you overwrite __getattr__, you need to make sure that __setstate__ is not clobbered if you want to be able to pickle the object. Hope that helps.

As @anivegesana and I essentially gave the same advice (i.e. add one of the serialization methods to your class), I'm going to close this. Please reopen if the discussion should be continued.

Ah. Didn't notice that. 😅