ROCm/llvm-project

[Issue]: flang-new: runtime and math functions don't link for OpenMP target regions

Opened this issue · 4 comments

Problem Description

I get many linker errors for OpenMP target regions when offloading to GPU. Symbols from libFortranRuntime show as undefined and so do some math intrinsics like cosh.

There are some other math intrinsics that do link successfully, like tanh.

@sfantao

Operating System

SUSE Linux Enterprise Server 15 SP5 (Cray OS on LUMI)

CPU

AMD EPYC 7742 64-Core

GPU

AMD Instinct MI250X

ROCm Version

ROCm 6.2.2

ROCm Component

flang

Steps to Reproduce

flang-new --version
AMD AFAR drop #4.0 9/28/24 flang-new version 20.0.0git (ssh://gerritgit/lightning/ec/llvm-project amd-feature/atd-fortran/2024.09.28 24385 1ad3ac337fa4b1a5a7621a4c5480028b54fffada)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /pfs/lustrep3/scratch/project_462000394/amd-sw/rocm-afar/5891/lib/llvm/bin
Build config: +assertions
$ cat link.F90 
program link
implicit none
real :: r
real, dimension(5) :: xs

!$omp target map(xs, r)
xs = 2
xs = modulo(xs, 3)
r = cosh(r)
r = tanh(r)
!$omp end target

end program
flang-new -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa --offload-arch=gfx90a -fdefault-real-8 link.F90
ld.lld: error: undefined symbol: _FortranAAssign
>>> referenced by /tmp/a.out.amdgcn.gfx90a-6a405a.img.lto.o:(__omp_offloading_54bbb604_5101b8ee__QQmain_l6)
>>> referenced by /tmp/a.out.amdgcn.gfx90a-6a405a.img.lto.o:(__omp_offloading_54bbb604_5101b8ee__QQmain_l6)

ld.lld: error: undefined symbol: _FortranAModuloReal8
>>> referenced by /tmp/a.out.amdgcn.gfx90a-6a405a.img.lto.o:(__omp_offloading_54bbb604_5101b8ee__QQmain_l6)
>>> referenced by /tmp/a.out.amdgcn.gfx90a-6a405a.img.lto.o:(__omp_offloading_54bbb604_5101b8ee__QQmain_l6)

ld.lld: error: undefined symbol: cosh
>>> referenced by /tmp/a.out.amdgcn.gfx90a-6a405a.img.lto.o:(__omp_offloading_54bbb604_5101b8ee__QQmain_l6)
>>> referenced by /tmp/a.out.amdgcn.gfx90a-6a405a.img.lto.o:(__omp_offloading_54bbb604_5101b8ee__QQmain_l6)
clang: error: ld.lld command failed with exit code 1 (use -v to see invocation)
/pfs/lustrep3/scratch/project_462000394/amd-sw/rocm-afar/5891/lib/llvm/bin/clang-linker-wrapper: error: 'clang' failed
flang-new: error: linker command failed with exit code 1 (use -v to see invocation)

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

No response

Additional Information

No response

In the user guide we have documented that adding -lFortranRuntimeHostDevice to the link line will resolve these link issues. As noted, using this device version of the FortranRuntime will result in low performance but allow linking and running of user programs. We however very much appreciate the reports of what functionality is needed by user codes from the runtime so the runtime calls can be circumvented. E.g. one would expect cosh to be able to be lowered directly without a call to the Fortran runtime.

Math functions may alternatively require -lm when linking.

VeeEM commented

Thanks! With the drop 4.2 compiler I am able to link the runtime with -lFortranRuntimeHostDevice. I'm curious, why is performance poor with the device runtime? Is it just overhead from calling library functions or something else entirely? The program I'm working on uses assign, dot_product, mod, modulo and sum in some target regions.

With the math functions I do still have the same problem, adding -lm to the compiler invocation does not help with linking cosh. tanh works fine, just as before.

$ flang-new -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa --offload-arch=gfx90a -fdefault-real-8 -lFortranRuntimeHostDevice -lm link.F90
ld.lld: error: undefined symbol: cosh
>>> referenced by a.out.amdgcn.gfx90a.img.lto.o:(__omp_offloading_54bbb604_4d007b9a__QQmain_l6)
>>> referenced by a.out.amdgcn.gfx90a.img.lto.o:(__omp_offloading_54bbb604_4d007b9a__QQmain_l6)
clang: error: ld.lld command failed with exit code 1 (use -v to see invocation)

If I compile with --save-temps and look into link-openmp-amdgcn-amd-amdhsa-gfx90a-llvmir.mlir, I see that the symbols for cosh and tanh look quite different to eachother. cosh is cosh, but the symbol for tanh is __ocml_tanh_f64.

I've opened an internal ticket regarding cosh so we will investigate. A drop 4.3 is available (need to update the user guide still, https://repo.radeon.com/rocm/misc/flang/). It may improve some of the assignment performance issues. The runtime is not really a device optimized library and is mostly the existing runtime compiled for device, which is a highly templated C++ library.