Xingyu-Lin/softgym

Compiled PyFlex does not work on Ubuntu 20

Skylion007 opened this issue · 8 comments

I have been trying to compile the latest version of SoftGym on a Ubuntu 20 machine, however, I have been unable to load the compile pyflex.so from either the system or a conda interperter. The error I keep getting is that the symbol __powf_finite is not defined which seems to be related to the libc version.

I have been using CUDA11.6, PyBind2.9.1 and Ubuntu 20. I have tested this issue on Python 3.9, 3.8, and 3.7 and it has caused the same issue on each. I tried compiling with clang, but I got several errors that prevented compilation altogether.

Can you copy and paste your full error message so that we can better diagnose?
Also, please exactly reproduce your steps.

When trying to import pyflex:

ImportError: .....pyflex.so: undefined symbol: __powf_finite

@DanielTakeshi Any updates?

Have you compiled PyFlex with docker?
What steps have you followed exactly?

@Skylion007

Hi, I encountered the same issue today w/PyFleX and figured out that the precompiled static library NvFlexExtReleaseCUDA uses __powf_finite function, which is not included in the latest libc++ google/filament#2146 (comment)

$ strings ../../lib/linux64/NvFlexExtReleaseCUDA_x64.a | grep finite
__powf_finite

Unfortunatelly we cannot easily re-compile NVIDIA FleX (proprietary software).
I just tried the following workaround and it worked locally (outside docker).

  • create libc_compat.c that only contains the following line
float __powf_finite(float x, float y) { return powf(x, y); }
  • and then link it to binary at CMakeLists
add_library(libc_compat ${ROOT}/bindings/libc_compat/libc_compat.c)
...
target_link_libraries(${EXAMPLE_BIN} PRIVATE ${ROOT}/lib/linux64/NvFlexExtReleaseCUDA_x64.a)
target_link_libraries(${EXAMPLE_BIN} PRIVATE libc_compat)
$ cmake -H. -Bbuild
$ make -j -C build

That is, I created the entity of __powf_finite by myself and linked so that NvFlexExtReleaseCUDA can refer to it.
It should work. I hope this helps.

info

  • g++ (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
  • conda 4.5.11 Python3.7.0
  • Ubuntu20.04

@denkiwakame It would be really useful if we could detect this error by checking the libc version and automatically apply this fix. Would you be willing to look into opening a PR?

@Skylion007

I don't mind creating a PR though, in my humble opinion, this is not "fix", but a "temporary workaround'' .

  • 1️⃣ I created the old libc-compatible dummy library for NvFleXExtReleaseCUDA, and the function just fallbacks to powf instead of the original __powf_finite .
  • 2️⃣ In my understanding, we can ``fix'' the issue only if we re-compile NVIDIA FleX without `-ffast-math` (which may cause a performance issue) https://bugzilla.redhat.com/show_bug.cgi?id=1803203
    • or, re-compile NVIDIA FleX with latest libc
    • .... , which are not possible for us since NVIDIA open-sources only their democodes https://github.com/NVIDIAGameWorks/FleX
    • The problem is not due to neither the SoftGym nor PyFleX, but the precompiled NVIDIA FleX which depends on the older libc and CUDA9.
  • 3️⃣ It seems that the original authors only support Ubuntu 16.04 or 18.04 (in docker). We should not extend supported platforms unless the maintainers are eager to do so, which will be a bit too much on their plate.
    • (side note) As long as I tested locally, we don't even need cuda-docker environments when compiling (all we need is libcudart9.1.a and statically link it to the python binding alongside with NvFleX).

Btw, have you resolved the problem? Although I applied a simple workaround, it would also be appreciated if you find out a better solution for this :D