When create GA array with large number of processes -- Received an Error in Communication: Mapc entries are not properly monotonic:
Closed this issue · 8 comments
Hello, my app is to realize full CI, which use GA to store states in global by "char"(this app is that described in the issue"when use GA_get, both two process stuck in epoll_wait" ).
I run this app in a cluster. Its tests are correct in small number processes. However, when I increase the array's dimension or processes' number, the app will report error like below. I find this error occurs in creating GA array(by printing information), more exactly in the
GA::GlobalArray *gaSpace = GA::SERVICES.createGA(gaSpace_type, gaSpace_ndim, gaSpace_dims, (char*)"gaSpace", nblockSpace, mapsSpace);
I add GA::sync before GA::SERVICES.createGA
but nothing changed.
I have done some tests with different array dimensions and different process's number, the result is shown in the table below. The only I find in google is "Many tests are failing due to ga errors", which seems only succeed with two MPI processes in the end.
What this error indicates and how to resolve it? Or are there any suggestions? I am very pleasure to supply any other information or other tests' result. Hoping to get your reply, thank you !
Edit1:Does this relates to multi-thread? I didn't use multi-thread by myself, but I am not sure if any library I used apply multi-thread. I execute my app by mpirun -np ...
.
Error Report
MPI_ABORT was invoked on rank 0 in communicator MPI COMMUNICATOR 3 DUP FROM 0
with errorcode 4101563.
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
......
[252] Received an Error in Communication: (4101563) 126:Mapc entries are not properly monotonic:
[504] Received an Error in Communication: (4101563) 252:Mapc entries are not properly monotonic:
[24] Received an Error in Communication: (4101563) 12:Mapc entries are not properly monotonic:
[10] Received an Error in Communication: (4101563) 5:Mapc entries are not properly monotonic:
......
[f04r3n12:53375] 511 more processes have sent help message help-mpi-api.txt / mpi-abort
[f04r3n12:53375] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
Tests result
N_ga_space: number of GA array's rows(unit is million), column number is fixly 129.
"error" in the result column: all are the same to the "error report" shown above.
I configure GA with
./configure F77=gfortran CC=gcc CXX=g++ MPIF77=mpif77 MPICXX=mpicxx MPICC=mpicc --with-gnu-ld --enable-cxx --prefix=/public/home/jrf/tools/ga-5.8 --exec-prefix=/public/home/jrf/tools/ga-5.8 --with-blas=/opt/hpc/software/compiler/intel/intel-compiler-2017.5.239/mkl --with-mpi-pr=1
and with some libraries
gcc 7.3.1
blas Intel MKL 2017.5.239
mpi openmpi (OpenRTE) 4.0.4rc3
ARMCI: MPI_PR
The cluster is using HDR InfiniBand network and it is not Blue Gene/Q.
The only MPI operate before createGA is using boost.mpi to broadcast some class objects.
This looks like an integer overflow problem. 70M*129 is 9.03B, which overflows INT_MAX
by 4.5x. The others get close, and I suspect that the GA mapping algorithm is overflowing dependent on the rows, columns, processes per node and number of nodes.
I do not know an easy solution here. You can break up your GA into a bunch of smaller ones, you can use a different API (e.g. MPI-3, ARMCI directly) or you can wait until someone fixes the integer math error in GA.
I might have a relatively simple workaround. I'll work on this for a bit and see what I can find.
I can't reproduce this issue. Need more information...
~/NWCHEM/ga/build$ mpicc -std=c11 -c jeff.c -I/tmp/jeff/ga/include -o jeff.o && mpifort -Mnomain jeff.o -L/tmp/jeff/ga/lib -lga -larmci -llapack -lblas && mpirun --mca pml ^ucx --use-hwthread-cpus -n 128 ./a.out
nprocs=128
dims={70000000,129} chunk={-1,-1}
g_a=-1000
GA Statistics for process 0
------------------------------
create destroy get put acc scatter gather read&inc
calls: 1 1 0 0 0 0 0 0
number of processes/call 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00
bytes total: 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00
bytes remote: 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00
Max memory consumed for GA by this process: 564375000 bytes
// ../configure --prefix=/tmp/jeff/ga --without-scalapack --disable-peigs && make -j`nproc` install
// mpicc -c jeff.c -I/tmp/jeff/ga/include -o jeff.o && mpifort -Mnomain jeff.o -L/tmp/jeff/ga/lib -lga -larmci -llapack -lblas && mpirun --mca pml ^ucx -n 4 ./a.out
#include <mpi.h>
#include "ga.h"
int main(int argc, char* argv[])
{
MPI_Init(&argc,&argv);
int me, np;
MPI_Comm_rank(MPI_COMM_WORLD, &me);
MPI_Comm_size(MPI_COMM_WORLD, &np);
if (me==0) printf("nprocs=%d\n",np);
GA_Initialize();
int dims[2] = {70000000,129};
int chunk[2] = {-1,-1};
if (me==0) printf("dims={%d,%d} chunk={%d,%d}\n",dims[0], dims[1], chunk[0], chunk[1]);
int g_a = NGA_Create(C_DBL, 2, dims, "A", chunk);
if (me==0) printf("g_a=%d\n",g_a);
double alpha = 17.0;
GA_Zero(g_a);
GA_Add_constant(g_a, &alpha);
GA_Sync();
GA_Destroy(g_a);
if (me==0) GA_Print_stats();
GA_Terminate();
MPI_Finalize();
return 0;
}
GA::GlobalArray *gaSpace = GA::SERVICES.createGA(gaSpace_type, gaSpace_ndim, gaSpace_dims, (char*)"gaSpace", nblockSpace, mapsSpace);
suggests you are using
GlobalArray * createGA(int type, int ndim, int dims[], char *arrayname, int block[], int maps[]);
instead of
GlobalArray * createGA(int type, int ndim, int dims[], char *arrayname, int chunk[]);
Can you please try the latter with chunk={-1,-1}
for debugging purposes?
Can you also send me the precise arguments of
GA::GlobalArray *gaSpace = GA::SERVICES.createGA(gaSpace_type, gaSpace_ndim, gaSpace_dims, (char*)"gaSpace", nblockSpace, mapsSpace);
including the array values of nblockSpace
and mapsSpace
?
Can you also send me the precise arguments of
GA::GlobalArray *gaSpace = GA::SERVICES.createGA(gaSpace_type, gaSpace_ndim, gaSpace_dims, (char*)"gaSpace", nblockSpace, mapsSpace);including the array values of
nblockSpace
andmapsSpace
?
Oh! I find my error. I use "(inum_of_row)/(num_of_process)" to set mapSpace's value, and the value becomes negative when inum_of_row beyond "int" type's scope(I use "int" to store them. This should be changed). I use forced type conversion and calculate mapSpace's value by long int, and it passed.
That's my foolish mistake, I am so sorry.
Really thank you for your help!
No worries. This is not an entirely trivial topic, as demonstrated by https://github.com/jeffhammond/BigMPI-paper/blob/master/exampi14_resubmission_3.pdf, for example 😄
You probably don't care at all about Windows, but long int
is 32b in the Windows ABI, so I recommend you use either long long int
or int64_t
to be safe. The former is more widely supported by default, at least, although the former is more reliable, since you know exactly how big it is.
No worries. This is not an entirely trivial topic, as demonstrated by https://github.com/jeffhammond/BigMPI-paper/blob/master/exampi14_resubmission_3.pdf, for example 😄
You probably don't care at all about Windows, but
long int
is 32b in the Windows ABI, so I recommend you use eitherlong long int
orint64_t
to be safe. The former is more widely supported by default, at least, although the former is more reliable, since you know exactly how big it is.
Got it! I will take the data type carefully. Thank you😊