zarr-developers/numcodecs

Multiprocessing start_method does not work after import

Closed this issue · 2 comments

Minimal, reproducible code sample, a copy-pastable example if possible

Hi, not sure if this is a numcodecs bug or not, but whenever I set multiprocessing start_method, there seems to be an error when I use libraries with numcodecs:

import numcodecs
import multiprocessing as mp
mp.set_start_method("spawn")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.10/multiprocessing/context.py", line 247, in set_start_method
    raise RuntimeError('context has already been set')
RuntimeError: context has already been set

Doing it the other way around seems to work:

import multiprocessing as mp
mp.set_start_method("spawn")
from numcodecs.abc import Codec

Full example with minimal reproduction:

$ python3 -m venv venv
$ . venv/bin/activate
$ pip install numcodecs
Collecting numcodecs
  Using cached numcodecs-0.13.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (8.6 MB)
Collecting numpy>=1.7
  Using cached numpy-2.0.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (19.5 MB)
Installing collected packages: numpy, numcodecs
Successfully installed numcodecs-0.13.0 numpy-2.0.1
$ python
Python 3.10.12 (main, Jul 29 2024, 16:56:48) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from numcodecs.abc import Codec
>>> import multiprocessing as mp
>>> mp.set_start_method("spawn")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.10/multiprocessing/context.py", line 247, in set_start_method
    raise RuntimeError('context has already been set')
RuntimeError: context has already been set

Problem description

Using libraries with numcodec, I am not able to control multiprocessing

Version and installation information

Python version 3.10 and 3.11.
numcodecs version 0.11.0 and 0.13 (latest).
I have tried in Ubuntu Linux 22.04, Windows, and using an Ubuntu docker image (22.04)

Actually discovered it's this line:

mutex = multiprocessing.Lock()

which happens globally, and thus prevents later configuration of set_start_method.

Minimal reproducible example:

$ python
Python 3.11.0 (main, Oct 24 2022, 18:26:48) [MSC v.1933 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import multiprocessing
>>> x = multiprocessing.Lock()
>>> multiprocessing.set_start_method("spawn")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python311\Lib\multiprocessing\context.py", line 247, in set_start_method
    raise RuntimeError('context has already been set')
RuntimeError: context has already been set

On second thought, I'll close this as it might not be a bug 😄