[Issue]: flang-new: runtime and math functions don't link for OpenMP target regions
Opened this issue · 4 comments
Problem Description
I get many linker errors for OpenMP target regions when offloading to GPU. Symbols from libFortranRuntime show as undefined and so do some math intrinsics like cosh.
There are some other math intrinsics that do link successfully, like tanh.
Operating System
SUSE Linux Enterprise Server 15 SP5 (Cray OS on LUMI)
CPU
AMD EPYC 7742 64-Core
GPU
AMD Instinct MI250X
ROCm Version
ROCm 6.2.2
ROCm Component
flang
Steps to Reproduce
flang-new --version
AMD AFAR drop #4.0 9/28/24 flang-new version 20.0.0git (ssh://gerritgit/lightning/ec/llvm-project amd-feature/atd-fortran/2024.09.28 24385 1ad3ac337fa4b1a5a7621a4c5480028b54fffada)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /pfs/lustrep3/scratch/project_462000394/amd-sw/rocm-afar/5891/lib/llvm/bin
Build config: +assertions
$ cat link.F90
program link
implicit none
real :: r
real, dimension(5) :: xs
!$omp target map(xs, r)
xs = 2
xs = modulo(xs, 3)
r = cosh(r)
r = tanh(r)
!$omp end target
end program
flang-new -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa --offload-arch=gfx90a -fdefault-real-8 link.F90
ld.lld: error: undefined symbol: _FortranAAssign
>>> referenced by /tmp/a.out.amdgcn.gfx90a-6a405a.img.lto.o:(__omp_offloading_54bbb604_5101b8ee__QQmain_l6)
>>> referenced by /tmp/a.out.amdgcn.gfx90a-6a405a.img.lto.o:(__omp_offloading_54bbb604_5101b8ee__QQmain_l6)
ld.lld: error: undefined symbol: _FortranAModuloReal8
>>> referenced by /tmp/a.out.amdgcn.gfx90a-6a405a.img.lto.o:(__omp_offloading_54bbb604_5101b8ee__QQmain_l6)
>>> referenced by /tmp/a.out.amdgcn.gfx90a-6a405a.img.lto.o:(__omp_offloading_54bbb604_5101b8ee__QQmain_l6)
ld.lld: error: undefined symbol: cosh
>>> referenced by /tmp/a.out.amdgcn.gfx90a-6a405a.img.lto.o:(__omp_offloading_54bbb604_5101b8ee__QQmain_l6)
>>> referenced by /tmp/a.out.amdgcn.gfx90a-6a405a.img.lto.o:(__omp_offloading_54bbb604_5101b8ee__QQmain_l6)
clang: error: ld.lld command failed with exit code 1 (use -v to see invocation)
/pfs/lustrep3/scratch/project_462000394/amd-sw/rocm-afar/5891/lib/llvm/bin/clang-linker-wrapper: error: 'clang' failed
flang-new: error: linker command failed with exit code 1 (use -v to see invocation)
(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
No response
Additional Information
No response
In the user guide we have documented that adding -lFortranRuntimeHostDevice
to the link line will resolve these link issues. As noted, using this device version of the FortranRuntime will result in low performance but allow linking and running of user programs. We however very much appreciate the reports of what functionality is needed by user codes from the runtime so the runtime calls can be circumvented. E.g. one would expect cosh
to be able to be lowered directly without a call to the Fortran runtime.
Math functions may alternatively require -lm
when linking.
Thanks! With the drop 4.2 compiler I am able to link the runtime with -lFortranRuntimeHostDevice. I'm curious, why is performance poor with the device runtime? Is it just overhead from calling library functions or something else entirely? The program I'm working on uses assign, dot_product, mod, modulo and sum in some target regions.
With the math functions I do still have the same problem, adding -lm to the compiler invocation does not help with linking cosh. tanh works fine, just as before.
$ flang-new -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa --offload-arch=gfx90a -fdefault-real-8 -lFortranRuntimeHostDevice -lm link.F90
ld.lld: error: undefined symbol: cosh
>>> referenced by a.out.amdgcn.gfx90a.img.lto.o:(__omp_offloading_54bbb604_4d007b9a__QQmain_l6)
>>> referenced by a.out.amdgcn.gfx90a.img.lto.o:(__omp_offloading_54bbb604_4d007b9a__QQmain_l6)
clang: error: ld.lld command failed with exit code 1 (use -v to see invocation)
If I compile with --save-temps and look into link-openmp-amdgcn-amd-amdhsa-gfx90a-llvmir.mlir, I see that the symbols for cosh and tanh look quite different to eachother. cosh is cosh
, but the symbol for tanh is __ocml_tanh_f64
.
I've opened an internal ticket regarding cosh
so we will investigate. A drop 4.3 is available (need to update the user guide still, https://repo.radeon.com/rocm/misc/flang/). It may improve some of the assignment performance issues. The runtime is not really a device optimized library and is mostly the existing runtime compiled for device, which is a highly templated C++ library.