ShadenSmith/splatt

Benchmarking SPLATT with MPI

solomonik opened this issue · 2 comments

Hello,

I am trying to benchmark SPLATT MTTKRP with MPI, but am running into issues with memory scalability and wanted to check if I am doing anything wrong.

I configure with --with-mpi and build via make (cmake). I have been setting OMP_NUM_THREADS=1 and running e.g., mpirun -np 4 splatt bench <my_tensor_file> -a splatt -i 1 or -a csf. I am able to run both variants to completion for a 2K-by-2K-by-2K random tensor with density .01, on one KNL node of Stampede2 with one process. However, running the same problem with 64 processes per node on 8 nodes immediately returns an error that suggests more memory is being allocated than there is available (seems the tensor is not read in successfully).

I've also tried a problem on my laptop locally that is 800-by-800-by-800 with density .125, for which -a csf works fine with 1,2,4 processes, but -a splatt fails with 2 or 4 MPI processes with a segfault, but runs fine with 1 process.

Please let me know if I am doing things in the wrong way or if splatt bench does not actually support MPI benchmarking.

Ok, yeah I think I see that splatt bench is not meant to work with MPI. But can run splatt cpd and benchmark MTTKRP in mpi_cpd_als_iterate, which works great my purposes. Seems to be fast!

Ah yes, apologies, splatt bench has not been updated to use any of the MPI routines. I should probably deprecate and remove that command as I have not really kept up with it.