Benchmarking SPLATT with MPI
solomonik opened this issue · 2 comments
Hello,
I am trying to benchmark SPLATT MTTKRP with MPI, but am running into issues with memory scalability and wanted to check if I am doing anything wrong.
I configure with --with-mpi
and build via make (cmake). I have been setting OMP_NUM_THREADS=1
and running e.g., mpirun -np 4 splatt bench <my_tensor_file> -a splatt -i 1
or -a csf
. I am able to run both variants to completion for a 2K-by-2K-by-2K random tensor with density .01, on one KNL node of Stampede2 with one process. However, running the same problem with 64 processes per node on 8 nodes immediately returns an error that suggests more memory is being allocated than there is available (seems the tensor is not read in successfully).
I've also tried a problem on my laptop locally that is 800-by-800-by-800 with density .125, for which -a csf
works fine with 1,2,4 processes, but -a splatt
fails with 2 or 4 MPI processes with a segfault, but runs fine with 1 process.
Please let me know if I am doing things in the wrong way or if splatt bench
does not actually support MPI benchmarking.
Ok, yeah I think I see that splatt bench
is not meant to work with MPI. But can run splatt cpd
and benchmark MTTKRP in mpi_cpd_als_iterate
, which works great my purposes. Seems to be fast!
Ah yes, apologies, splatt bench
has not been updated to use any of the MPI routines. I should probably deprecate and remove that command as I have not really kept up with it.