eth-cscs/squashfs-mount

Possible race condition in libmount

Opened this issue · 2 comments

Quite often but not always squashfs-mount fails when starting many processes on the same node:

nid020204 ~ $ srun -N1 -n128 squashfs-run store.squashfs true
nid020204 ~ $ srun -N1 -n128 squashfs-run store.squashfs true
Failed to mount
srun: error: nid020205: task 90: Exited with exit code 1
srun: launch/slurm: _step_signal: Terminating StepId=1549.0
slurmstepd: error: *** STEP 1549.0 ON nid020205 CANCELLED AT 2022-07-13T13:08:41 ***
srun: error: nid020205: tasks 3,7-9,11-12,14-17,21,33-34,41,45,50,52,59,62-63,77,80,100,103,105,108,110,112,121-123,126: Terminated
srun: error: nid020205: tasks 6,30,91,99,114: Terminated
srun: Force Terminated StepId=1549.0

Probably sarus would run into the same issue, although they exec mount instead of using libmount directly.

Obviously we should turn Failed to mount into a proper error message fetched from libmount, but it'd be nice to fix the actual libmount problem too, which likely has to do with a race in reuse of loop devices. (Note that reuse of loop devices is not strictly required, but the number loop devices is limited by the kernel, and 1 loop device per squashfs file makes a lot of sense)

A slightly more useful error:

$ ./stress.sh 
x/: overlapping loop device exists for /home/harmen/Documents/projects/setuid-squashfs-mount/store.squashfs
x/: overlapping loop device exists for /home/harmen/Documents/projects/setuid-squashfs-mount/store.squashfs
x/: overlapping loop device exists for /home/harmen/Documents/projects/setuid-squashfs-mount/store.squashfs
x/: overlapping loop device exists for /home/harmen/Documents/projects/setuid-squashfs-mount/store.squashfs
x/: overlapping loop device exists for /home/harmen/Documents/projects/setuid-squashfs-mount/store.squashfs

I think the issue is solved in Linux 5.8: https://lwn.net/Articles/820408/

Previously, if you wanted to set parameters such as the offset on a loop device, this required calling LOOP_SET_FD to set the backing file, and then LOOP_SET_STATUS to set the offset.

This is exactly what we're hitting here...