uqfoundation/dill

Problem serializing instance of a class that uses a module

Closed this issue · 4 comments

I have a problem serializing an instance of a class that uses a module, something like:

import some_module
import dill

class SomeClass:
    def __init__(self):
        super().__init__()

    def foo(self):
        some_module.bar()

def main():
    obj = SomeClass()
    with open("path.pkl", "wb") as f:
        dill.dump(obj, f)

if __name__ == "__main__":
    main()

some_module contains things that can't be serialized, and that's fine, I don't want to serialize it. But if I serialize with
dill.settings["recurse"] = False, I get the error name 'some_module' is not defined when deserializing. If I serialize with dill.settings["recurse"] = True, I get an error about the things in some_module that can't be serialized.

I know as a workaround I can move the import some_module into the foo function, but I'm building a framework and I don't want to have to ask my users to do that. Also, Pickle does not seem to have such an issue.

can you give an example that I can confirm? especially if pickle works, as you say. It seems that your example works for me, so I'd like to see where you are seeing a failure.

Python 3.8.17 (default, Jun 11 2023, 01:54:00) 
[Clang 13.1.6 (clang-1316.0.21.2.5)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import math
>>> import dill
>>> class SomeClass:
...   def __init__(self):
...     super().__init__()
...   def foo(self):
...     return math.sin(0)
... 
>>> def main():
...   obj = SomeClass()
...   with open('path.pkl', 'wb') as f:
...     dill.dump(obj, f)
... 
>>> main()
>>> with open('path.pkl', 'rb') as f:
...   print(dill.load(f).foo())
... 
0.0
>>> dill.__version__
'0.3.8.dev0'
>>> 

If you put this in some_module.py:

import ctypes

# load an arbitrary dll
lib = ctypes.CDLL('C:/Windows/System32/msvcp100.dll')

def bar():
    pass

this fails:

import some_module
import dill

class SomeClass:
    def __init__(self):
        super().__init__()

    def foo(self):
        some_module.bar()

def main():
    obj = SomeClass()
    dill.settings["recurse"] = True
    with open("path.pkl", "wb") as f:
        dill.dump(obj, f)

if __name__ == "__main__":
    main()

with error Can't pickle <class '_ctypes.PyCFuncPtrType'>: it's not found as _ctypes.PyCFuncPtrType.

This succeeds:

import some_module
import pickle

class SomeClass:
    def __init__(self):
        super().__init__()

    def foo(self):
        some_module.bar()

def main():
    obj = SomeClass()
    with open("path.pkl", "wb") as f:
        pickle.dump(obj, f)

if __name__ == "__main__":
    main()

And I can't remove the dill.settings["recurse"] = True because SomeClass sometimes also uses globals.

If you use this file (I'm testing on a MacOS, not Windows):

# file: some_module.py
import ctypes
import ctypes.util

lib = ctypes.CDLL(ctypes.util.find_library('libc'))

def bar():
  return 0.0

then do what you have above, I can reproduce the error with dill, and no error with pickle.
However, pickle doesn't actually work...

Python 3.8.17 (default, Jun 11 2023, 01:54:00) 
[Clang 13.1.6 (clang-1316.0.21.2.5)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pickle
>>> with open('path.pkl', 'rb') as f:
...   obj = pickle.load(f)
... 
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
AttributeError: Can't get attribute 'SomeClass' on <module '__main__' (<_frozen_importlib_external.SourceFileLoader object at 0x104541c70>)>

because the class is pickled by reference, and the reference is not present. Were you to put this into a file, and then install the file as a module that is globally available upon import, then pickle would work. The difference is that dill is serializing the class, and to do that, it needs to serialize the method, and the underlying function... which uses globals. dill provides a few options to serialize the class (including byref, which reproduces the behavior from pickle). IF there's something that's not serializable in globals, then it's going to fail. I think the best solution is probably to suggest that users include the import in the function, so that the function doesn't rely on references to globals.

This is a known issue, so I'm going to close this as a duplicate... or you can confirm that pickle performs differently than described above.

Feel free to reopen given my notes above