Notes on Venado
Yurlungur opened this issue · 0 comments
Yurlungur commented
I am going to put my notes on building/running on Venado here.
Grace-Grace Superchip Nodes
These were pretty easy to build for:
module load cray-mpich cray-hdf5-parallel
cmake -DPARTHENON_DISABLE_HDF5_COMPRESSION=ON -DPARTHENON_ENABLE_PYTHON_MODULE_CHECK=OFF ..
make -j20
There is a bug in cray-mpich so to run you must export
export MPICH_MALLOC_FALLBACK=1
# note CMA doesn't work here. NONE is required.
export MPICH_SMP_SINGLE_COPY_MODE=NONE
But then you can run. I see about 5e7 zone-cycles per node second with pure MPI without doing any thought or optimization.
Grace-Hopper Chips
These require cuda toolkit. PrgEnv-cray
and PrgEnv-gnu
both work. However, I needed to change the version of Kokkos to 4.2.01
. In external/Kokkos
, git checkout 4.2.01
. Then:
module swap PrgEnv-gnu # necessary or not? unclear
module load cray-mpich cray-hdf5-parallel cudatoolkit
export MPICH_OFI_NIC_POLICY=GPU # GPU NUMA ROUND-ROBIN
export MPICH_GPU_SUPPORT_ENABLED=1 # Allows GPU Aware MPI
export CRAY_ACCEL_TARGET=nvidia90
export MPICH_MALLOC_FALLBACK=1
export MPICH_SMP_SINGLE_COPY_MODE=NONE
export MPICH_MAX_THREAD_SAFETY=multiple
export FI_CXI_RX_MATCH_MODE=hybrid
export PMI_MMAP_SYNC_WAIT_TIME=600
and then
cmake -DKokkos_ENABLE_CUDA=ON -DKokkos_ENABLE_CUDA_LAMBDA=ON -DPARTHENON_DISABLE_HDF5_COMPRESSION=ON -DPARTHENON_ENABLE_PYTHON_MODULE_CHECK=OFF ..
make -j20
The code runs on single node with MPI. Have not yet checked multi-node or performance.