lammps/lammps

[BUG] LAMMPS CMake configuration requires HIP_PATH to be set

hagertnl opened this issue · 11 comments

Summary

The following line in cmake/Modules/DetectHIPInstallation.cmake requires that HIP_PATH be set in order to find the HIP installation.

if(NOT DEFINED HIP_PATH)

Beginning in ROCm/5.1.0, we were advised that AMD is deprecating this environment variable, since it refers to the ROCm version 4 directory structure. Leaving the environment variable set did not cause problems until ROCm/5.5.1, when this environment variable begins interfering with hipcc when HIP_PATH=${ROCM_PATH}/hip. In ROCm/5.5.1+, hipcc is no longer able to automatically find header files for a HIP code.

AMD has since advised that setting HIP_PATH should be removed from modulefiles and should not be used anymore. If you look in ${ROCM_PATH}/hip, which is what HIP_PATH is typically set to, you find that all files are symlinks to ${ROCM_PATH}/lib.

I do not know the state of this environment variable when using HIP on CUDA architectures.

LAMMPS Version and Platform

Current devel branch -- commit fef0e07 at the time of submitting ticket.

Expected Behavior

Use either $ROCM_PATH or find_package(HIP) to detect HIP installation.

Actual Behavior

`
CMake Error at Modules/DetectHIPInstallation.cmake:3 (message):
HIP support requires HIP_PATH to be defined.

Either pass the HIP_PATH as a CMake option via -DHIP_PATH=... or set the
HIP_PATH environment variable.
Call Stack (most recent call first):
Modules/Packages/KOKKOS.cmake:141 (include)
CMakeLists.txt:661 (include)
`

Steps to Reproduce

Basic KOKKOS & KSPACE build with no HIP_PATH in environment:

cmake \ -DPKG_KOKKOS=on \ -DPKG_KSPACE=on \ -DBUILD_MPI=off \ -DBUILD_OMP=off \ -DCMAKE_INSTALL_PREFIX=$INSTDIR \ -DCMAKE_CXX_COMPILER=${ROCM_PATH}/bin/hipcc \ -DKokkos_ENABLE_HIP=on \ -DKokkos_ARCH_VEGA90A=ON \ -DCMAKE_CXX_STANDARD=14 \ -DLAMMPS_MACHINE=frontier_hip \ -DFFT=fftw3 \ ../cmake
The combination of KOKKOS, KSPACE, and non-KISS FFT triggers this code path.

@hagertnl thanks we will take a look

Do not use find_package(HIP). That's outdated too. For the GPU package we use find_package(hip) (lower case).

If there are any changes make to support the different settings for different ROCm distributions, there should be some place somewhere in the manual, where it is documented what settings apply to which version. Also, there should be some statements which versions are supported and which are considered obsolete.

Anything below ROCm 5.4.3 is not worth supporting and I wouldn't waste any time and effort on getting 5.5 and 5.6. working and straight go to 5.7.1. FYI, I'm going to LLNL in December for a El Cap hackathon, so I might spent some cycles to fixup the ROCm for LAMMPS support.

Do not use find_package(HIP). That's outdated too. For the GPU package we use find_package(hip) (lower case).

Got it.

One thing I noticed looking through some of the examples packaged in ROCm distributions is that some of the CMakeLists.txt files still have HIP_PATH defined as a cache variable, but they set HIP_PATH = $ROCM_PATH, instead of HIP_PATH = $ROCM_PATH/hip.
I'm not sure if that's a viable solution here, to just modify what HIP_PATH is set to in DetectHIPInstallation.cmake, but it seems to work in a basic test I ran.

I checked with an AMD engineer, and they confirmed that HIP_PATH=$ROCM_PATH is supported and recommended. HIP_PATH=$ROCM_PATH/hip would be only needed for ROCm versions < 5.1.0.

So, my initial ticket body is incorrect. The deprecation is only of setting HIP_PATH=$ROCM_PATH/hip. I don't know if any change is actually needed in the DetectHIPInstallation.cmake file, unless you want the ability to set HIP_PATH based on ROCM_PATH.

@hagertnl thanks for writing this up and checking. If that's the recommendation I would look into just doing HIP_PATH=$ROCM_PATH. So that would just mean that we drop trying to detect HIP_PATH and just have:

if(NOT DEFINED ROCM_PATH)
    if(NOT DEFINED ENV{ROCM_PATH})
        set(ROCM_PATH "/opt/rocm" CACHE PATH "Path to ROCm installation")
    else()
        set(ROCM_PATH $ENV{ROCM_PATH} CACHE PATH "Path to ROCm installation")
    endif()
endif()
list(APPEND CMAKE_PREFIX_PATH ${ROCM_PATH})

and also require ROCm 5.4.3+.

As a fun side note. Even find_package(hip) is now discouraged. What we actually all should be moving towards is using HIP language support in CMake. That's going to be a lot of fun for all involved 😉

Would dropping HIP_PATH entirely affect the HIP on CUDA and/or HIP on Intel (eventually) compatibility? From what little I know of compiling HIP on other architectures, I believe that would still require HIP_PATH, since ROCM_PATH won't exist at that point.

HIP on CUDA likely isn't something worth caring heavily about, since users should just use CUDA backend on CUDA architectures, but HIP on Intel could possibly be useful, I'd think. Though Intel I believe recommends SYCL and OpenMP target offload, and Kokkos has those backends as well.