Cannot use callable that was pickled within pytest
dionhaefner opened this issue ยท 14 comments
I am running tests that serialize callables with dill
and try to load them in a subprocess to make sure everything worked correctly. I am getting a cryptic error when trying to load the callable from the subprocess, presumably because dill is failing to load the test module.
Example:
# save as dill_test.py
import sys
import tempfile
from textwrap import dedent
def foo():
pass
def test_dill():
import subprocess
import dill
with tempfile.TemporaryDirectory() as tmpdir:
picklefile = f"{tmpdir}/foo.pickle"
with open(picklefile, "wb") as f:
f.write(dill.dumps(foo))
test_script = dedent(f"""
import dill
with open("{picklefile}", "rb") as f:
func = dill.load(f)
func()
""")
subprocess.run([sys.executable, "-c", test_script], check=True)
if __name__ == "__main__":
test_dill()
print("ok")
Calling through pytest
gives this error:
$ pytest dill_test.py
E subprocess.CalledProcessError: Command '['/Users/dion/.virtualenvs/py312/bin/python', '-c', '\nimport dill\nwith open("/var/folders/fk/g5ssrkz179z1mjmvqn1j3q1m0000gn/T/tmphuyt802o/foo.pickle", "rb") as f:\n func = dill.load(f)\nfunc()\n']' returned non-zero exit status 1.
/opt/homebrew/Cellar/python@3.12/3.12.0/Frameworks/Python.framework/Versions/3.12/lib/python3.12/subprocess.py:571: CalledProcessError
-------------------------------------------------------------------------- Captured stderr call ---------------------------------------------------------------------------
Traceback (most recent call last):
File "<string>", line 4, in <module>
File "/Users/dion/.virtualenvs/py312/lib/python3.12/site-packages/dill/_dill.py", line 287, in load
return Unpickler(file, ignore=ignore, **kwds).load()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/dion/.virtualenvs/py312/lib/python3.12/site-packages/dill/_dill.py", line 442, in load
obj = StockUnpickler.load(self)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/dion/.virtualenvs/py312/lib/python3.12/site-packages/dill/_dill.py", line 432, in find_class
return StockUnpickler.find_class(self, module, name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ModuleNotFoundError: No module named 'dill_test'
========================================================================= short test summary info =========================================================================
FAILED tests/dill_test.py::test_dill - subprocess.CalledProcessError: Command '['/Users/dion/.virtualenvs/py312/bin/python', '-c', '\nimport dill\nwith open("/var/folders/fk/g5ssrkz179z1mjmvqn1j3q1m0000gn/...
Calling it directly works:
$ python dill_test.py
ok
Funnily enough, it works when I do this before pickling:
foo.__globals__.pop(foo.__name__)
I want to make sure I'm understanding this correctly, but running your script normally works, however if you run under the control of pytest
(and subprocess
), it throws the error above. Is that correct? If so, I'd be interested to run with dill.detect.trace(True)
.
That's what I thought, but now I realized this is actually a pathing issue.
$ python tests/dill_test.py
ok
$ cd tests
$ pytest dill_test.py
ok
$ pytest tests/dill_test.py
NOT OK
So in the latter case, dill.load
tries to import dill_test.py
but fails because it's not on sys.path
. It is fixed by changing the load script to this:
test_script = dedent(f"""
import dill
import sys
sys.path.append("{os.path.dirname(__file__)}")
with open("{picklefile}", "rb") as f:
func = dill.load(f)
func()
""")
Is there a way to pickle a function so it can be executed even if the original module isn't available when unpickling?
Generally, dill
assumes that module dependencies are installed... and while it does provide different approaches for tracing dependencies in the global scope... what you might be able to do in any case is to dump the module along with the function. Then you'd load the module and then the function. Something like this is only needed for "uninstalled" modules. This is ok for saving state, but not really that good for parallel computing.
Generally, dill assumes that module dependencies are installed.
But why is this module a dependency in the first place? The function doesn't access any globals.
The global dict is required to create a function object.
Python 3.8.18 (default, Aug 25 2023, 04:23:37)
[Clang 13.1.6 (clang-1316.0.21.2.5)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import types
>>> print(types.FunctionType.__doc__)
Create a function object.
code
a code object
globals
the globals dictionary
name
a string that overrides the name from the code object
argdefs
a tuple that specifies the default argument values
closure
a tuple that supplies the bindings for free variables
>>>
However, dill
has different settings that modify how the global dict is handled. So, you can try dill.settings['recurse'] = True
, which will only pickle items in the global dict that are pointed to by the function, and otherwise stores a dummy global dict.
Thanks, I think I understand the problem now. recurse=True
doesn't work but I guess that's due to some modifications done to the callable by pytest.
you can often see what's going on with dill.detect.trace(True)
Okay here goes nothing.
This is the case that works:
$ python tests/dill_test.py
โฌ F1: <function foo at 0x102580040>
โโฌ F2: <function _create_function at 0x102fb32e0>
โโ # F2 [34 B]
โโฌ Co: <code object foo at 0x102755b00, file "/private/tmp/tests/dill_test.py", line 6>
โโโฌ F2: <function _create_code at 0x102fb3370>
โโโ # F2 [19 B]
โโ # Co [102 B]
โโฌ D2: <dict object at 0x0102fc49c0>
โโ # D2 [25 B]
โโฌ D2: <dict object at 0x0102956a00>
โโ # D2 [2 B]
โโฌ D2: <dict object at 0x0102fc4b80>
โโโฌ D2: <dict object at 0x0102938ac0>
โโโ # D2 [2 B]
โโ # D2 [23 B]
โ # F1 [198 B]
This is the one that doesn't:
$ pytest tests/dill_test.py
โฌ F2: <function foo at 0x104473be0>
โ # F2 [20 B]
So if pytest is involved, dill doesn't even try to pickle any of the function's attributes...?
Essentially, yes. "F2" is passing the function off to pickle
. The key is that there's an internal function called _locate_function
, and if that returns False
... probably in this case because _import_module
does not find the module... then it punts to pickle
which gives up.
Isn't it the other way around? According to https://github.com/uqfoundation/dill/blob/master/dill/_dill.py#L1881C12-L1881C12, dill uses the stock pickler when _locate_function
returns True
. But this is not what I want, since I want to dump the function object itself, not a reference to it.
Yes, you are correct. I missed the not
in the if
statement.
Could you imagine having a flag similar to byref
for modules that forces dill to pickle the function object instead of a reference to it? I think this would get us a lot closer to what we want to achieve.
yes, there is a PR that is mostly done that handles a bunch of module serialization variants. work on it seems to have stalled a bit though.