uqfoundation/dill

Wrong __main__ module set in function reconstructed in child process

Closed this issue · 4 comments

Hello dear maintainers
I'm trying to use dill to serialize functions I pass to child processes created using multiprocessing. I does work well when the child process creation method is to fork the current process, but gives weird results when the method is to spawn a whole new python process.

Here is a minimal example that fails (Python 3.8.10, ubuntu 20)

import dill
import multiprocessing as mp
#import multiprocess as mp     # multiprocess does not help, even though made to work with dill

def test_presence():
	print('present !')
	
def job():
	import sys
	print('job')
	print('__main__ is', sys.modules[__name__].__name__, sys.modules[__name__].__dict__.keys())
	print('but got', __name__, globals().keys())
	print()
	test_presence()
	
def extract(dump):
	dill.loads(dump)()
	
if __name__ == '__main__':
	mp.set_start_method('spawn')
	#mp.set_start_method('fork')    # no problem with that one, but available on all platforms
	
	process = mp.Process(target=extract, args=(dill.dumps(job),))
	process.start()

It gives the following results:

job
__main__ is __mp_main__ dict_keys(['__name__', '__doc__', '__package__', '__loader__', '__spec__', '__file__', '__cached__', '__builtins__', 'dill', 'mp', 'test_presence', 'job', 'extract'])
but got __main__ dict_keys(['__name__', '__doc__', '__package__', '__loader__', '__spec__', '__annotations__', '__builtins__', 'spawn_main'])

Process Process-1:
Traceback (most recent call last):
  File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ydejonghe/robot-cueillette/tests/test_dill_multiprocessing.py", line 17, in extract
    dill.loads(dump)()
  File "tests/test_dill_multiprocessing.py", line 14, in job
    test_presence()
NameError: name 'test_presence' is not defined

You can see that function job is dilled refering module __main__, but when reconstructed in child process, it's creating a custom dictionnary to use as module '__main__' because the child main module has been renamed '__mp_main__'
What I fail to understand is that the child processes still have a '__main__' entry in sys.modules, sodill should be able to pick the right module for the reconstructed function

Using the 'fork' child creation method does not rename the main module, so the issue does not occur:

job
__main__ is __main__ dict_keys(['__name__', '__doc__', '__package__', '__loader__', '__spec__', '__annotations__', '__builtins__', '__file__', '__cached__', 'dill', 'multiprocessing', 'test_presence', 'job', 'extract', 'process'])
but got __main__ dict_keys(['__name__', '__doc__', '__package__', '__loader__', '__spec__', '__annotations__', '__builtins__', '__file__', '__cached__', 'dill', 'multiprocessing', 'test_presence', 'job', 'extract', 'process'])

present !

The same problem occurs using multiprocess instead of multiprocessing

Do you see any workaround for this ?

Thanks for the question, and investigating a bit.

This is a duplicate of uqfoundation/multiprocess#65, and it's due to differences in pickling across the different contexts. I don't yet have a good solution for the default dill serialization settings for spawn... however, if you use the recurse=True setting, it should work in your case.

import dill
import multiprocess as mp
dill.settings['recurse'] = True
...

This worked for me.

Python 3.8.15 (default, Oct 12 2022, 04:30:07) 
[Clang 10.0.1 (clang-1001.0.46.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> dill.__version__
'0.3.6.dev0'
>>> import multiprocess
>>> multiprocess.__version__
'0.70.14.dev0'

Please close the issue if this answers your question.

Also see: #105, and other linked issues, for a longer discussion. I'm going to switch and close this as a duplicate.

Thanks for your fast answer ! (and all that material)
I investigated a bit more and found a potential solution. I'm posting it here, because I'm not sure the issue was the only problem in #115 and #105. I guess it could solve uqfoundation/multiprocess#65

I solved the problem by altering the exceptional behavior handling the main module in _dill.Unpickler

        if (module, name) == ('__builtin__', '__main__'):
            # formerly:    self._main is not __main__ anymore
            #return self._main.__dict__ #XXX: above set w/save_module_dict  
            # fix:    get a reference to the last __main__ in date
            import __main__ 
            return __main__.__dict__

It seems that the 'spawn' start method is reassigning the main module after complete initialization of the child process, hence the reference to module __main__ that dill is storing at initialization (in _dill._main_module and in self._main afterward) is not good anymore

To check that, adding the following assertion before the above lines will raise:

            assert self._main is __main__ or _main_module is __main__

I'm not sure this fix addresses the vast amount of cases dill is trying to handle. At least it fixes the minimal example in the above description. Do you think this could be an acceptable solution for this bug ?

Hmm.... very interesting. I like it. Yeah, that is a pretty good potential fix for the bug. I'd like to test it out against the dill and multiprocess test suites. Something like that I generally also test against klepto and mystic, as they have some very advanced serialization cases.

Thanks! Feel free to submit a PR.