HIP and MPI+HIP builds broken since adding f-function support (PR #312)
ohearnk opened this issue · 0 comments
ohearnk commented
#312 broke all HIP and MPI+HIP builds (with and without f-function support).
New CUDA and MPI+CUDA codes need to be backported to respective HIP and MPI+HIP implementations.
My current working notes on this are as follows:
- Delete all HIP / MPI+HIP sources, and replace with converted CUDA / MPI+CUDA sources using hipify tools
cd QUICK/src
rm hip/*.{cu,h,cpp}
rm -rf hip/iclass && cp -r cuda/iclass hip
for FILE in $(ls *.{cu,h,cpp}); do hipify-perl "${FILE}" -o "../hip/${FILE}"; done
- Manually fix issues
--CUDA_MPIV
->HIP_MPIV
-- src/hip/gpu.cu:49:debugFile = fopen("debug.cuda", "w+");
-- NVTX -> ROC-tracer (https://github.com/ROCm/roctracer)
---#include "nvToolsExt.h"
->#include "roctx.h"
---nvtxRangePushA
->roctxRangePush
---nvtxRangePop
->roctxRangePop
-- HIP kernel tuning:hipLaunchKernelGGL
,__attribute__
,__launch_bounds__
--- Q: why static variables? => preprocessor definitions
-- future proof code for porting by changing CUDA and HIP string prefixes with generic GPU prefixes
Issues:
- After updating the CMake build system, the following linking error comes up involving XC (on AAC for MI210s):
[ 98%] Linking CXX shared library libquick_hip.so
lld: error: undefined symbol: devSim_dft
>>> referenced by lto.tmp:(get_cshell_density_kernel())
>>> referenced by lto.tmp:(get_cshell_density_kernel())
>>> referenced by lto.tmp:(cshell_getxc_kernel())
>>> referenced 9 more times
clang++: error: amdgcn-link command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [src/CMakeFiles/libquick_hip.dir/build.make:2491: src/libquick_hip.so] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:258: src/CMakeFiles/libquick_hip.dir/all] Error 2
gmake: *** [Makefile:156: all] Error 2