argo_init crashes with large memory requestion
Opened this issue · 10 comments
argo_init((size_t)64 * 1024 * 1024 * 1024)
crashes but argo_init((size_t)32 * 1024 * 1024 * 1024)
works fine. I guess it's easy to reproduce because argo_init
is the first statement in the program. Tested using the master branch, with the MPI backend, using 2 nodes with 1 process each.
Could you please add the error output you get?
Could you please add the error output you get?
It just returns 1 without any output.
How much memory do your nodes have each, physically, and how much memory is available for files under /dev/shm/?
We have 252G per node. Extra 95G can be stored into /dev/shm per node (still larger than 64G). What dose ArgoDSM use /dev/shm for?
when using ARGO_VM_SHM (the default) it creates a file descriptor in /dev/shm for mapping pages.
See https://github.com/etascale/argodsm/blob/master/src/virtual_memory/shm.cpp for the use of /dev/shm.
First thing to try is to enable another vm handler. Currently available are ARGO_VM_MEMFD (requires linux kernel 3.17 or newer) and ARGO_VM_ANONYMOUS. Please set one of these to ON (and ARGO_VM_SHM to OFF) using cmake, and recompile.
If both of these show the same behaviour, I'll need more information about your system.
With MEMFD, it failed even with small memory requestion. The output is:
[mpiexec@i1] HYDU_sock_write (../../utils/sock/sock.c:418): write error (Bad file descriptor)
[mpiexec@i1] HYD_pmcd_pmiserv_send_signal (../../pm/pmiserv/pmiserv_cb.c:252): unable to write data to proxy
(i1
is a hostname)
With ANONYMOUS, it doesn't crash. But it swallows around 160G memory when I request only 64G, and it uses around 2 minutes to initialize.
I hope I don't have to use ANONYMOUS.
Ideally we would be able to use the option for ARGO_VM_* that provides the best performance, but unfortunately we have to adapt the system a bit depending on what can and cannot be changed on a system.
Unfortunately, we will need more information about the system you are using to be able to debug this.
Can you tell us what hardware you are running this on?
What distribution and kernel version are you using?
What MPI implementation and version are you using?
What compiler and version are you using?
Do you have superuser access to your cluster nodes?
Would it be possible for us to get access to the machine to debug this?
If not, you may have to debug yourself to see in which part of the code the program crashes.
An easy way to confirm that the /dev/shm size is the issue for ARGO_VM_SHM would be to initialize with 45G (and see that it works) and just over half the free memory in /dev/shm, e.g. 48G and see that it fails.
What is the output of cat /proc/sys/vm/overcommit_memory
? Can you change it?
I am not sure the initialization time is that much different from what can be expected for this amount of memory, usually for us initialization is dominated by the time needed by MPI to register the memory range.
As for the "swallowed memory", I do not know what numbers you are looking at, but they can be deceiving. Unless you actually cannot allocate the memory you have, I don't think showing larger numbers of memory in use are an issue.
You are right. It's the /dev/shm size issue. Approximately it will occupy 1.5x of the requested memory in /dev/shm. So it's OK to request 60G in our system.
It will be better if ArgoDSM can report a detailed error message.
I agree. The code is supposed to print an error message, so I would be very interested in finding out why this does not happen. On my machines, this has not happened thus far, so I would require your assistance to find out more about it.
Hi again,
it is possible your issues disappear with the patches in #19.
Could you please test this and tell us whether ArgoDSM with these patches behaves as you would expect, or whether your issues still persist?