GPUOpen-LibrariesAndSDKs/HIPRTSDK

HIPRT SDK 2.0.0 missing library (Linux)

FilipVaverka opened this issue ยท 20 comments

HIPRT SDK 2.0.0 archive at gpuopen.com seems to be missing linux binaries for HIPRT libraries.

ib00 commented

There is a new release that has Linux support now (2.0.3). I have tested it and it works.

I'm getting

UnbundleFiles error: 'Linker Program': No such file or directory

error with this version (both bundled tutorials and my own code). Looking at strace output, it seems to be looking for literally Linker Program for some reason:

openat(AT_FDCWD, "Linker Program", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)

It should be noted that I do have latest ROCm packages installed (ROCm 5.5.0).

ib00 commented

Ah, I haven't tried it with ROCm. Only tried it with CUDA and it's worked for me.

I see, the same binary works for me, when I select NVIDIA/CUDA device (using HIP_VISIBLE_DEVICES= ). Well, shame... AMDs own ray tracing library works with NVIDIA better than AMD. Maybe some issue with ROCm version, but I haven't been able to even find what are required dependencies on Linux (I thought it only uses driver, just like on Windows).
I'm not trying to run HIP integration example at this point, just normal Orochi path.

ib00 commented

Yes, same experience here. I haven't been able to get ROCm properly installed and working on Linux. I have tried several times to no avail.

I have that running without problem (on OpenSUSE, with latest kernel (6.3.2) and upstream AMDGPU/KFD driver and their official ROCm packages). Even hipSYCL, PCI-E P2P and things like Tensorflow or Pytorch generally work for me (I'm running Radeon VII and RX 7900).
The pre-2.X version of HIPRT even worked with ROCm/HIP, but I wouldn't be able to use 7xxx series GPU with that version.

We have debugged this and found out it was ROCm compiler issue on Linux specially linker related error that is reported here. It is fixed and the driver will be made public soon. We will let you know once the driver with the fix is available. Apologies for the delay in reply.

Note: please make sure to have a GPU with RAM > 8GB which is a requirement for HIPRT.

@PixelClear So seeing amd driver updates on linux, for the fix, are we talking about rocm 5.6, a new 5.5.x release or part of new amdgpu pro package (currently stuck at 22.40.6 seems vs windows 23.10 branch for 7600 series..
Also hope then hiprt linux support is enabled in Blender 4.0 alpha soon..

So, I've upgraded to ROCm 5.6 and now (still using 7900 XTX) I'm getting segfault in libamd_comgr.
Here is the backtrace of one of tutorials:

#0  0x00007ffff0fc14fb in ?? () from /opt/rocm-5.6.0/lib/libamd_comgr.so.2
#1  0x00007ffff0fcabad in ?? () from /opt/rocm-5.6.0/lib/libamd_comgr.so.2
#2  0x00007fffee61a86a in ?? () from /opt/rocm-5.6.0/lib/libamd_comgr.so.2
#3  0x00007fffee626364 in amd_comgr_do_action ()
   from /opt/rocm-5.6.0/lib/libamd_comgr.so.2
#4  0x00007fffedb63222 in ?? () from /opt/rocm/hip/lib/libhiprtc.so
#5  0x00007fffedb65817 in ?? () from /opt/rocm/hip/lib/libhiprtc.so
#6  0x00007fffedb59b68 in hiprtcLinkComplete ()
   from /opt/rocm/hip/lib/libhiprtc.so
#7  0x00007ffff7f1d577 in ?? () from ../../hiprt/linux64/libhiprt0200064.so
#8  0x00007ffff7f377b0 in ?? () from ../../hiprt/linux64/libhiprt0200064.so
#9  0x00007ffff7f612b9 in hiprtBuildTraceKernelsFromBitcode ()
   from ../../hiprt/linux64/libhiprt0200064.so
#10 0x000000000041db60 in TutorialBase::buildTraceKernelFromBitcode (
    this=0x7fffffffcc40, ctxt=0x5d82d0, 
    path=0x424dc8 "../common/TutorialKernels.h", 
    functionName=0x424db1 "GeomIntersectionKernel", 
    functionOut=@0x7fffffffcae8: 0x0, opts=0x0, funcNameSets=0x0, 
    numGeomTypes=0, numRayTypes=1) at ../common/TutorialBase.cpp:221
#11 0x0000000000417ce6 in Tutorial::run (this=0x7fffffffcc40)
    at ../01_geom_intersection/main.cpp:73
#12 0x000000000041787e in main (argc=1, argv=0x7fffffffcd98)
    at ../01_geom_intersection/main.cpp:96

UPDATE: I've just checked strace and it seems to be the same issue (it just no longer prints the error).

We will check it on our side and get back to you. Thanks alot for trying this.

Update (Linux 6.4.11 with ROCm 5.7), HIPRT 2.0.0 tutorials don't crash anymore. Output images seem fine when running on Radeon VII. However, output images are black when running on 7900 XTX. (older version of HIPRT seems to work even with 7900 XTX).

So HIPRT v2.1 was just released and I did another test (Linux 6.5.6, Mesa 23.2.1 and ROCm 5.7). One again, even 01_geom_intersection does NOT produce any output (black image) when running on 7900 XTX. The code still works only with old Radeon VII, which lacks ray-tracing hardware.

I'm sorry, but is this supposed to ever properly work, or is it another case of vaporware, where HW goes obsolete before SW is ready?

ib00 commented

It doesn't work with CUDA either (on Linux). I get lots of errors:

../../hiprt/hiprt_types.h(381): warning #1055-D: types cannot be declared in anonymous unions
{
^

../common/Common.h(34): error: variable "FltMin" has already been defined
constexpr float FltMin = 1.175494351e-38f;
^

../common/Common.h(35): error: variable "FltMax" has already been defined
constexpr float FltMax = 3.402823466e+38f;

So HIPRT v2.1 was just released and I did another test (Linux 6.5.6, Mesa 23.2.1 and ROCm 5.7). One again, even 01_geom_intersection does NOT produce any output (black image) when running on 7900 XTX. The code still works only with old Radeon VII, which lacks ray-tracing hardware.

I'm sorry, but is this supposed to ever properly work, or is it another case of vaporware, where HW goes obsolete before SW is ready?

7900 XTX is supposed to work (like other RDNA3 GPUs). My guess is that your ROCm 5.7 is not compatible with HIPRT binaries compiled by ROCm 5.4 (you can check the version in the filename). In general, the compatibility is not guaranteed. We should have released v2.1 compiled with ROCm 5.7, our bad. We will release it as a patch soon.

It doesn't work with CUDA either (on Linux). I get lots of errors:

../../hiprt/hiprt_types.h(381): warning #1055-D: types cannot be declared in anonymous unions { ^

../common/Common.h(34): error: variable "FltMin" has already been defined constexpr float FltMin = 1.175494351e-38f; ^

../common/Common.h(35): error: variable "FltMax" has already been defined constexpr float FltMax = 3.402823466e+38f;

I've tried to compile the tutorials on Linux (Ubuntu) and I don't see these errors. It's strange that the constants are clashing with the ones in the hiprt namespace in hiprt_common.h. Could you try to compile this branch? https://github.com/GPUOpen-LibrariesAndSDKs/HIPRTSDK/tree/bugfix/HRTSDK-0-cuda-fixes

ib00 commented

Thank you! This fix solves the compilation problem. The tutorials compile on Linux with CUDA and correct output is produced.

Hi @FilipVaverka We have just released a patch (v2.1.c202dac on https://gpuopen.com/hiprt/) with 5.7 binaries. Please, could you give a try? It should be compatible with this driver/ROCm https://repo.radeon.com/amdgpu-install/23.20/ubuntu/focal/.

Thank you! Brilliant, all tutorials seem to work now and with about 6x speedup over Radeon VII. There seems to be some issue with custom BVH import (both GPUs), but I don't think I need that at the moment.

ps.: Sorry for being bit abrasive before. You guys are doing great work and I do very much appreciate it.

ib00 commented

Yes, I share the sentiment here: I appreciate the work on this library!

Glad to hear that it's finally working :-) Please, open another issue for the BVH import issue if possible.