Python 3.8 support

Question

Python 3.8 support

emlys opened this issue 4 years ago · 20 comments

Observed issue happening with 3.8.0 and not 3.7.9:
A minimal taskgraph results in a repeating RuntimeError. This is probably related to a change in the 3.8.0 beta 1 release: "bpo-33725: On macOS, the multiprocessing module now uses spawn start method by default."

Minimal example:

import taskgraph
def func():
    return 1
graph = taskgraph.TaskGraph('/Users/emily/Documents/delineateit_workspace', 2)
task = graph.add_task(func, task_name='test')
graph.join()

Traceback (repeats forever):

RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.

This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
      freeze_support()
      ...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.
Traceback (most recent call last):
File "", line 1, in
File "/opt/miniconda3/envs/bug3.8-3/lib/python3.8/multiprocessing/spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "/opt/miniconda3/envs/bug3.8-3/lib/python3.8/multiprocessing/spawn.py", line 125, in _main
prepare(preparation_data)
File "/opt/miniconda3/envs/bug3.8-3/lib/python3.8/multiprocessing/spawn.py", line 236, in prepare
_fixup_main_from_path(data['init_main_from_path'])
File "/opt/miniconda3/envs/bug3.8-3/lib/python3.8/multiprocessing/spawn.py", line 287, in _fixup_main_from_path
main_content = runpy.run_path(main_path,
File "/opt/miniconda3/envs/bug3.8-3/lib/python3.8/runpy.py", line 262, in run_path
return _run_module_code(code, init_globals, run_name,
File "/opt/miniconda3/envs/bug3.8-3/lib/python3.8/runpy.py", line 95, in _run_module_code
_run_code(code, mod_globals, init_globals,
File "/opt/miniconda3/envs/bug3.8-3/lib/python3.8/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/Users/emily/invest/src/natcap/invest/reproduce_bug.py", line 7, in
graph = taskgraph.TaskGraph('/Users/emily/Documents/delineateit_workspace', 2)
File "/opt/miniconda3/envs/bug3.8-3/lib/python3.8/site-packages/taskgraph/Task.py", line 360, in init
self._worker_pool = NonDaemonicPool(
File "/opt/miniconda3/envs/bug3.8-3/lib/python3.8/site-packages/taskgraph/Task.py", line 79, in init
super(NonDaemonicPool, self).init(*args, **kwargs)
File "/opt/miniconda3/envs/bug3.8-3/lib/python3.8/multiprocessing/pool.py", line 212, in init
self._repopulate_pool()
File "/opt/miniconda3/envs/bug3.8-3/lib/python3.8/multiprocessing/pool.py", line 303, in _repopulate_pool
return self._repopulate_pool_static(self._ctx, self.Process,
File "/opt/miniconda3/envs/bug3.8-3/lib/python3.8/multiprocessing/pool.py", line 326, in _repopulate_pool_static
w.start()
File "/opt/miniconda3/envs/bug3.8-3/lib/python3.8/multiprocessing/process.py", line 121, in start
self._popen = self._Popen(self)
File "/opt/miniconda3/envs/bug3.8-3/lib/python3.8/multiprocessing/context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "/opt/miniconda3/envs/bug3.8-3/lib/python3.8/multiprocessing/context.py", line 283, in _Popen
return Popen(process_obj)
File "/opt/miniconda3/envs/bug3.8-3/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in init
super().init(process_obj)
File "/opt/miniconda3/envs/bug3.8-3/lib/python3.8/multiprocessing/popen_fork.py", line 19, in init
self._launch(process_obj)
File "/opt/miniconda3/envs/bug3.8-3/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 42, in _launch
prep_data = spawn.get_preparation_data(process_obj._name)
File "/opt/miniconda3/envs/bug3.8-3/lib/python3.8/multiprocessing/spawn.py", line 154, in get_preparation_data
_check_not_importing_main()
File "/opt/miniconda3/envs/bug3.8-3/lib/python3.8/multiprocessing/spawn.py", line 134, in _check_not_importing_main
raise RuntimeError('''

Answer 1 · 2020-10-15T16:32:05.000Z

Huh, interesting: I remember adding this awhile back with spawn becoming the default in mind:

class NoDaemonContext(type(multiprocessing.get_context('spawn'))):
    """From https://stackoverflow.com/a/8963618/42897.
    "As the current implementation of multiprocessing [3.7+] has been
    extensively refactored to be based on contexts, we need to provide a
    NoDaemonContext class that has our NoDaemonProcess as attribute.
    [NonDaemonicPool] will then use that context instead of the default
    one." "spawn" is chosen as default since that is the default and only
    context option for Windows and is the default option for Mac OS as
    well since 3.8.

Answer 2 · 2020-10-21T20:34:36.000Z

Hi @emlys or @phargogh. Are either of you available to take this bug? I don't have OSX to test this on or have any way to recreate it.

Answer 3 · 2020-10-22T21:57:00.000Z

Sure, happy to look into this after I've finished the latest couple of things that have come up with coastal blue carbon.

Answer 4 · 2021-01-28T21:28:55.000Z

hi @phargogh, just noticed this again. and/or @emlys, is this still an issue on OSX?

Answer 5 · 2021-02-03T00:21:13.000Z

I was able to reproduce this issue on Python 3.8 with Taskgraph 0.10.2, and this issue appears to have been fixed as of taskgraph 0.10.3. @emlys could you confirm that this works for you with any version of python 3.8 in taskgraph 0.10.3?

Since this issue sounds related to natcap/invest#420, I tried running a recent dev build of InVEST and can confirm that DelineateIt runs successfully on n_workers>0. I'll put in a patch over there to bump up the taskgraph requirement.

Answer 6 · 2021-02-03T17:48:01.000Z

@phargogh I'm still having the same issue with taskgraph 0.10.3, python 3.8.0 and 3.8.7. Is it expected that we always have to call multiprocessing.freeze_support() when using taskgraph, as we do in the invest entrypoint?

Answer 7 · 2021-02-03T18:41:23.000Z

@emlys multiprocessing.freeze_support() should only be needed if we're using python's multiprocessing within a frozen binary environment such as PyInstaller on Windows. It should have no effect outside of a frozen environment and on any other OS.

Answer 8 · 2021-02-03T18:46:49.000Z

I'm still having the same issue with taskgraph 0.10.3, python 3.8.0 and 3.8.7.

@emlys out of curiosity, is your python coming from conda-forge or somewhere else?

The minimally-reproducible script I'm using to test this is:

cd taskgraph
git checkout 0.10.3
conda create -c conda-forge -p ./env-387 python=3.8.7 retrying psutil rstcheck
conda activate ./env387
python setup.py install
# Then I run the test script in the original post, with success

Answer 9 · 2021-02-03T18:54:21.000Z

@phargogh I followed those steps almost exactly (had to use python3.8.6, I got an error that python==3.8.7 isn't available on conda-forge) and I'm still getting the error.

Answer 10 · 2021-02-03T20:23:40.000Z

Just summarizing what I've found so far:

In the past, you could avoid this issue by using fork.
- fork works by copying the parent process to make the subprocess
- spawn works by importing everything that was imported into the parent process, into the subprocess. Because of that, you get this recursive error unless you have the if __name__ == '__main__' guard.
The fork start method is now considered unsafe on macOS and isn't available on Windows anyway
- We should explicitly set the start method to spawn since it's the only one available on unix and windows, and it's not the default on non-macOS unix.
spawn has extra requirements to be used safely.
- I had thought the error message was saying you need to call freeze_support(), but actually the if __name__ == '__main__' guard is the important part.
- This is clearly the solution for uses within our code.
- It is not clear if this is a good solution for taskgraph as a package.
  - do we need to tell users that any code that uses taskgraph must be wrapped in a if __name__ == '__main__'?
  - maybe it doesn't matter because we are the primary (only?) users of taskgraph.
I still don't know why James doesn't get this error with taskgraph 0.10.3.

Answer 11 · 2021-02-03T20:42:33.000Z

In a conda environment with:

python 3.8.6
taskgraph 0.10.2

This script fails with the same _check_not_importing_main as described in the post:

import logging

logging.basicConfig(level=logging.DEBUG)

import taskgraph
def func():
    return 1

graph = taskgraph.TaskGraph('./test-workspace', 2)
task = graph.add_task(func, task_name='test')
graph.join()

Same python installation, same version of taskgraph, this works great:

import logging

logging.basicConfig(level=logging.DEBUG)

import taskgraph
def func():
    return 1

if __name__ == '__main__':
    graph = taskgraph.TaskGraph('./test-workspace', 2)
    task = graph.add_task(func, task_name='test')
    graph.join()

Taking a closer look at taskgraph's multiprocessing, it looks like spawn is already used on all operating systems.

So it does seem strange that the non-if __name__ == '__main__' is working at all on my mac.

maybe it doesn't matter because we are the primary (only?) users of taskgraph.

I agree with this for two reasons:

We are indeed (probably) the only users of taskgraph
If someone DOES get this error, the python devs have very kindly put the solution in the error text.

I'll keep poking at this a bit more to see if anything else comes up.

Answer 12 · 2021-02-03T20:46:00.000Z

maybe it doesn't matter because we are the primary (only?) users of taskgraph.

I agree with this for two reasons:
We are indeed (probably) the only users of taskgraph
If someone DOES get this error, the python devs have very kindly put the solution in the error text.

Would this also affect someone using the natcap.invest package? That could be slightly more of a problem

Answer 13 · 2021-02-03T20:46:16.000Z

Ah! If I make sure to remove ./test-workspace before running the script, I see the exception on 0.10.3. If ./test-workspace exists, I don't see the exception.

EDIT: nope, now it's always failing.

Answer 14 · 2021-02-03T20:51:00.000Z

So now it seems like the changes in 0.10.3 did fix natcap/invest#420, but it's likely unrelated to this error. (which makes sense because the invest entrypoint does use if __name__ == '__main__', so this shouldn't come up)

Answer 15 · 2021-02-03T21:05:45.000Z

So now it seems like the changes in 0.10.3 did fix natcap/invest#420, but it's likely unrelated to this error. (which makes sense because the invest entrypoint does use if name == 'main', so this shouldn't come up)

Yep, I agree. The hanging behavior seen there is unrelated to this error.

Would this also affect someone using the natcap.invest package? That could be slightly more of a problem

Yep! This issue will indeed affect someone using InVEST so long as they're using n_workers > 0 (which is thankfully not default behavior). If they want to use n_workers > 0, they'll need to do the if __name__ == '__main__' dance. But it is also a reasonable expectation that someone wanting to use multiprocessing will be able to understand and debug their system when things go wrong, and short of not using spawn there's nothing we can do about this anyways.

Answer 16 · 2021-02-03T21:10:12.000Z

But it is also a reasonable expectation that someone wanting to use multiprocessing will be able to understand and debug their system when things go wrong, and short of not using spawn there's nothing we can do about this anyways.

I feel like this deserves a note in the API docs, just confirming that this is what we expect to happen. Also maybe in the taskgraph readme? The example code there should also get updated.

Answer 17 · 2021-02-03T21:48:16.000Z

I'll add note to the taskgraph README and submit a PR to you. Since natcap/invest#467 is still pending, I'll add a note to the API docs there.

Answer 18 · 2021-02-03T22:11:13.000Z

Thanks! I had started on updating the taskgraph README example code but I ran into a new issue... #60

Answer 19 · 2021-02-23T21:16:17.000Z

Sorry I missed this entire thing @phargogh and @emlys, is the "fix" here to guard your code with an if __name__ == '__main__'? I asked about this years ago on stackoverflow and got a :tumbleweed: https://stackoverflow.com/questions/50242773/can-i-guard-against-python-multiprocessing-bomb

Otherwise, someone using multiprocessing really needs to do an if __name__ == '__main__' or they're gonna hurt.

Answer 20 · 2021-02-23T22:49:47.000Z

Yep! This ultimately comes down to documenting the if __name__ == '__main__' in the example.