hydra: spawn failing due to empty-string argument
drew-parsons opened this issue · 1 comments
Debian Linux now uses mpich as the default MPI on 32-bit architectures (armel, armhf, i386, hppa, m68k, powerpc).
The corresponding rebuild of mpi4py fails spawn tests, reported at mpi4py/mpi4py#514
Running in an armel (armv8l) chroot with env variables HYDRA_IFACE=lo HYDRA_LAUNCHER=fork
gets the following error log, with exit code 255
(sid_armel-dchroot)$ HYDRA_IFACE=lo HYDRA_LAUNCHER=fork PYTHONPATH=./debian/python3-mpi4py/usr/lib/python3/dist-packages autopkgtest -B -- null
...
testPutProcNull (test_rma_nb.TestRMAWorld.testPutProcNull) ... ok
testPutProcNull (test_rma_nb.TestRMAWorld.testPutProcNull) ... ok
testPutProcNull (test_rma_nb.TestRMAWorld.testPutProcNull) ... ok
testPutProcNull (test_rma_nb.TestRMAWorld.testPutProcNull) ... ok
testPutProcNull (test_rma_nb.TestRMAWorld.testPutProcNull) ... ok
ok
testArgsOnlyAtRoot (test_spawn.TestSpawnSelf.testArgsOnlyAtRoot) ... ok
testArgsOnlyAtRoot (test_spawn.TestSpawnSelf.testArgsOnlyAtRoot) ... ok
testArgsOnlyAtRoot (test_spawn.TestSpawnSelf.testArgsOnlyAtRoot) ... ok
testArgsOnlyAtRoot (test_spawn.TestSpawnSelf.testArgsOnlyAtRoot) ... testArgsOnlyAtRoot (test_spawn.TestSpawnSelf.testArgsOnlyAtRoot) ... [proxy:0@amdahl] Sending upstream hdr.cmd = CMD_STDERR
[proxy:0@amdahl] Sending upstream hdr.cmd = CMD_STDERR
[proxy:0@amdahl] Sending upstream hdr.cmd = CMD_STDERR
[proxy:0@amdahl] Sending upstream hdr.cmd = CMD_STDERR
[proxy:0@amdahl] Sending upstream hdr.cmd = CMD_STDERR
[proxy:0@amdahl] Sending upstream hdr.cmd = CMD_STDERR
[proxy:0@amdahl] we don't understand this command, forwarding upstream
[proxy:0@amdahl] mcmd=spawn
nprocs=1
execname=/usr/bin/python3.12
totspawns=1
spawnssofar=1
argcnt=3
arg1=/tmp/autopkgtest.VaRWBv/tree/test/spawn_child.py
arg2=/tmp/autopkgtest.VaRWBv/tree/debian/python3-mpi4py/usr/lib/python3/dist-packages
arg3=
preput_num=1
preput_key_0=PARENT_ROOT_PORT_NAME
preput_val_0=tag#0$description#amdahl$port#57335$ifname#127.0.0.1$
info_num=0
endcmd
[proxy:0@amdahl] Sending upstream hdr.cmd = CMD_PMI
[proxy:0@amdahl] we don't understand this command, forwarding upstream
[proxy:0@amdahl] mcmd=spawn
nprocs=1
execname=/usr/bin/python3.12
totspawns=1
spawnssofar=1
argcnt=3
arg1=/tmp/autopkgtest.VaRWBv/tree/test/spawn_child.py
arg2=/tmp/autopkgtest.VaRWBv/tree/debian/python3-mpi4py/usr/lib/python3/dist-packages
arg3=
preput_num=1
preput_key_0=PARENT_ROOT_PORT_NAME
preput_val_0=tag#0$description#amdahl$port#36665$ifname#127.0.0.1$
info_num=0
endcmd
[proxy:0@amdahl] Sending upstream hdr.cmd = CMD_PMI
[mpiexec@amdahl] [pgid: 0] got PMI command: mcmd=spawn
nprocs=1
execname=/usr/bin/python3.12
totspawns=1
spawnssofar=1
argcnt=3
arg1=/tmp/autopkgtest.VaRWBv/tree/test/spawn_child.py
arg2=/tmp/autopkgtest.VaRWBv/tree/debian/python3-mpi4py/usr/lib/python3/dist-packages
arg3=
preput_num=1
preput_key_0=PARENT_ROOT_PORT_NAME
preput_val_0=tag#0$description#amdahl$port#57335$ifname#127.0.0.1$
info_num=0
endcmd
[unset]: ERROR: Expecting value after arg3= in parse_v1_mcmd (202)
[unset]: ERROR: PMIU_cmd_parse (310)
[mpiexec@amdahl] handle_pmi_cmd (mpiexec/pmiserv_cb.c:57): unable to parse PMI command
[mpiexec@amdahl] control_cb (mpiexec/pmiserv_cb.c:367): unable to process PMI command
[mpiexec@amdahl] HYDT_dmxu_poll_wait_for_event (lib/tools/demux/demux_poll.c:76): callback returned error status
[mpiexec@amdahl] HYD_pmci_wait_for_completion (mpiexec/pmiserv_pmci.c:173): error waiting for event
[mpiexec@amdahl] main (mpiexec/mpiexec.c:260): process manager error waiting for completion
autopkgtest [16:29:04]: ERROR: testbed failure: testbed auxverb failed with exit code 255
Thanks @drew-parsons for reporting the issue. I believe this is not related to 32-bit architecture. It is due to an older version of mpi4py passing an empty argument in the spawn command -- the arg3
in the log. The current PMI code does not handle empty arguments. I guess we could accept empty argument since the arguments are separated by newlines -- which means we can even accept spaces -- but that also have more opportunities for users to make mistakes without realizing it. We'll discuss this next week.