ROCm-Developer-Tools/LLVM-AMDGPU-Assembler-Extra

asm-kernel example failed on Carrizo

PXAHyLee opened this issue · 4 comments

Hi developer,

my environment has a Carrizo APU, ROCM related packages pulled from apt server, and LLVM trunk (r269157).
I execute one of the example, asm-kernel, and get the following error:

Using agent: Carrizo
hsa_executable_load_code_object failed: HSA_STATUS_ERROR_INCOMPATIBLE_ARGUMENTS: The arguments passed to a functions are not compatible.

Failed.

The other examples have the same error related to code object.
ldd asm-kernel shows:

        linux-vdso.so.1 =>  (0x00007ffcb9224000)
        libhsa-runtime64.so.1 => /opt/rocm/hsa/lib/libhsa-runtime64.so.1 (0x00007ff8fa162000)
        libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007ff8f9e52000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007ff8f9c3a000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007ff8f9872000)
        libhsakmt.so.1 => /opt/rocm/libhsakmt/lib/libhsakmt.so.1 (0x00007ff8f9662000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007ff8f945a000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007ff8f923a000)
        librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007ff8f9032000)
        libelf.so.1 => /usr/lib/x86_64-linux-gnu/libelf.so.1 (0x00007ff8f8e1a000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007ff8f8b12000)
        /lib64/ld-linux-x86-64.so.2 (0x00005639d91e4000)

How to fix this? Thank you.

The samples are all for Fiji and won't work on Carrizo as is.
It may be possible to adapt them for Carrizo by doing some changes:

  • Change -mcpu=fiji to -mcpu=carrizo in CMake files (examples/CMakeLists.txt)
  • Change assembler sources to have 8:0:1 compute capability instead of 8:0:3 here: "hsa_code_object_isa 8, 0, 3, "AMD", "AMDGPU""

Hi, @nhaustov

Sorry I don't notice the -mcpu option in CMake files before sending this issue.
After following your instructions, the example now segfault. The backtrace (found in gdb) is

#0  0x00007ffff79f6037 in ?? () from /opt/rocm/hsa/lib/libhsa-runtime64.so.1
#1  0x0000000000404673 in amd::dispatch::Dispatch::AllocateLocalMemory(unsigned long) ()
#2  0x000000000040484d in amd::dispatch::Dispatch::AllocateBuffer(unsigned long) ()
#3  0x0000000000403860 in AsmKernelDispatch::Setup() ()
#4  0x0000000000404b11 in amd::dispatch::Dispatch::Run() ()
#5  0x0000000000404c20 in amd::dispatch::Dispatch::RunMain() ()
#6  0x00000000004036e3 in main ()

Hi, I have changed all the mcpu to carrizo, rm -rf build directory and re-run cmake. After doing these, the example is segfault.

I re-run the process mentioned above to make sure the result is still the same (segfault).