tox-dev/filelock

Multiprocessing with FileLock fails in python 3.9

lhoestq opened this issue · 6 comments

On python 3.9 with filelock 3.8.0, this code hangs:

from multiprocessing import Pool
from filelock import FileLock


def run(i):
    print(f"got the lock in multi process [{i}]")


with FileLock("tmp.lock"):
    with Pool(2) as pool:
        pool.map(run, range(2))

This is because the subprocesses tries to acquire the lock from the main process for some reason. This is not the case in older versions of python.

This can cause many issues in python 3.9.

For example, we use multiprocessing to run a pool of jobs and we use filelock to prevent running the same pool of job several times and avoid writing collisions.

First reported in huggingface/datasets#4113

Can you reproduce with python 3.10 and pyuthon3.11? What OS?

I can reproduce on 3.10.0 on macos - haven't tried 3.11

Can confirm this is an issue with my test bed as well. Fedora (python 3.9.9, FileLock 3.8.0). Kinda makes using FileLock pointless if this doesn't work.

I don't think filelock is doing anything wrong here.
The code in the OP is "wrong".
It spawns processes in the module itself, and the processes themselves run that same code.
If anything, lockfile is hiding that issue.

...
#with FileLock("tmp.lock"):
with Pool(2) as pool:
    pool.map(run, range(2))

If we remove filelock from the equation, the interpreter will throw RuntimeErrors at you and explain the issue:

An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.

This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:

	if __name__ == '__main__':
		freeze_support()
		...

The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.

To get the desired behaviour in OP's code, we can use:

...
def main():
    with FileLock("tmp.lock"):
        with Pool(2) as pool:
            pool.map(run, range(2))


if __name__ == "__main__":
    main()

After giving it a touch more thought, I think what might be missing in terms of understanding is how subprocesses work.
It should make more sense if you add a bit of logging.

import logging
from multiprocessing import Pool
from filelock import FileLock

logging.basicConfig(
    format="%(asctime)s - process:%(process)d - %(message)s",
    level=logging.DEBUG,
    datefmt="%H:%M:%S",
)
logging.getLogger("filelock").setLevel(logging.DEBUG)

logging.info("defining run")
def run(i):
    print(f"got the lock in multi process [{i}]")

logging.info("getting lock")
with FileLock("tmp.lock"):
    with Pool(2) as pool:
        pool.map(run, range(2))

We end up with:

18:38:51 - process:4416 - defining run
18:38:51 - process:4416 - getting lock
18:38:51 - process:4416 - Attempting to acquire lock 2404827780944 on tmp.lock
18:38:51 - process:4416 - Lock 2404827780944 acquired on tmp.lock
18:38:52 - process:16372 - defining run
18:38:52 - process:16372 - getting lock
18:38:52 - process:16372 - Attempting to acquire lock 2221233535152 on tmp.lock
18:38:52 - process:16372 - Lock 2221233535152 not acquired on tmp.lock, waiting 0.05 seconds ...
18:38:52 - process:2468 - defining run
18:38:52 - process:2468 - getting lock
18:38:52 - process:2468 - Attempting to acquire lock 3056346724528 on tmp.lock
18:38:52 - process:2468 - Lock 3056346724528 not acquired on tmp.lock, waiting 0.05 seconds ... 

That makes sense, thanks ! I'm closing this one if it's good for everyone