merzlab/QUICK

HIP and MPI+HIP builds broken since adding f-function support (PR #312)

ohearnk opened this issue · 0 comments

#312 broke all HIP and MPI+HIP builds (with and without f-function support).

New CUDA and MPI+CUDA codes need to be backported to respective HIP and MPI+HIP implementations.

My current working notes on this are as follows:

  • Delete all HIP / MPI+HIP sources, and replace with converted CUDA / MPI+CUDA sources using hipify tools
cd QUICK/src
rm hip/*.{cu,h,cpp}
rm -rf hip/iclass && cp -r cuda/iclass hip
for FILE in $(ls *.{cu,h,cpp}); do hipify-perl "${FILE}" -o "../hip/${FILE}"; done
  • Manually fix issues
    -- CUDA_MPIV -> HIP_MPIV
    -- src/hip/gpu.cu:49: debugFile = fopen("debug.cuda", "w+");
    -- NVTX -> ROC-tracer (https://github.com/ROCm/roctracer)
    --- #include "nvToolsExt.h" -> #include "roctx.h"
    --- nvtxRangePushA -> roctxRangePush
    --- nvtxRangePop -> roctxRangePop
    -- HIP kernel tuning: hipLaunchKernelGGL, __attribute__, __launch_bounds__
    --- Q: why static variables? => preprocessor definitions
    -- future proof code for porting by changing CUDA and HIP string prefixes with generic GPU prefixes

Issues:

  • After updating the CMake build system, the following linking error comes up involving XC (on AAC for MI210s):
[ 98%] Linking CXX shared library libquick_hip.so
lld: error: undefined symbol: devSim_dft
>>> referenced by lto.tmp:(get_cshell_density_kernel())
>>> referenced by lto.tmp:(get_cshell_density_kernel())
>>> referenced by lto.tmp:(cshell_getxc_kernel())
>>> referenced 9 more times
clang++: error: amdgcn-link command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [src/CMakeFiles/libquick_hip.dir/build.make:2491: src/libquick_hip.so] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:258: src/CMakeFiles/libquick_hip.dir/all] Error 2
gmake: *** [Makefile:156: all] Error 2