parthenon-hpc-lab/parthenon

Notes on Venado

Yurlungur opened this issue · 0 comments

I am going to put my notes on building/running on Venado here.

Grace-Grace Superchip Nodes

These were pretty easy to build for:

module load cray-mpich cray-hdf5-parallel
cmake -DPARTHENON_DISABLE_HDF5_COMPRESSION=ON -DPARTHENON_ENABLE_PYTHON_MODULE_CHECK=OFF ..
make -j20

There is a bug in cray-mpich so to run you must export

export MPICH_MALLOC_FALLBACK=1
# note CMA doesn't work here. NONE is required.
export MPICH_SMP_SINGLE_COPY_MODE=NONE

But then you can run. I see about 5e7 zone-cycles per node second with pure MPI without doing any thought or optimization.

Grace-Hopper Chips

These require cuda toolkit. PrgEnv-cray and PrgEnv-gnu both work. However, I needed to change the version of Kokkos to 4.2.01. In external/Kokkos, git checkout 4.2.01. Then:

module swap PrgEnv-gnu # necessary or not? unclear
module load cray-mpich cray-hdf5-parallel cudatoolkit
export MPICH_OFI_NIC_POLICY=GPU   # GPU NUMA ROUND-ROBIN
export MPICH_GPU_SUPPORT_ENABLED=1 # Allows GPU Aware MPI
export CRAY_ACCEL_TARGET=nvidia90
export MPICH_MALLOC_FALLBACK=1
export MPICH_SMP_SINGLE_COPY_MODE=NONE
export MPICH_MAX_THREAD_SAFETY=multiple
export FI_CXI_RX_MATCH_MODE=hybrid
export PMI_MMAP_SYNC_WAIT_TIME=600 

and then

cmake -DKokkos_ENABLE_CUDA=ON -DKokkos_ENABLE_CUDA_LAMBDA=ON -DPARTHENON_DISABLE_HDF5_COMPRESSION=ON -DPARTHENON_ENABLE_PYTHON_MODULE_CHECK=OFF ..
make -j20

The code runs on single node with MPI. Have not yet checked multi-node or performance.