LuxCoreRender/LuxMark

LuxMark 4.0 alpha on Linux amd64 crashing

baryluk opened this issue · 5 comments

OpenCL on GPU only mode.

After starting, showing up the window and detecting devices it works a bit, I can see scene loaded and BVH being build, but it crashes after few seconds:

user@debian:~/lux/luxmark$ ROOT=/home/user/lux/luxmark
user@debian:~/lux/luxmark$ export LD_PRELOAD="$ROOT/lib/libembree3.so.3:$ROOT/lib/libOpenImageDenoise.so.0:$ROOT/lib/libtbb.so.2:$ROOT/lib/libtbbmalloc.so.2"
user@debian:~/lux/luxmark$ gdb --args ./luxmark.bin --scene=FOOD --mode=BENCHMARK_OCL_GPU 
GNU gdb (Debian 9.2-1) 9.2
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./luxmark.bin...
(No debugging symbols found in ./luxmark.bin)
(gdb) r
Starting program: /home/user/lux/luxmark/luxmark.bin --scene=FOOD --mode=BENCHMARK_OCL_GPU
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7fffec80a700 (LWP 2174629)]
...
[Thread 0x7ffe4cff9700 (LWP 2174774) exited]
--Type <RET> for more, q to quit, c to continue without paging--

Thread 1 "luxmark.bin" received signal SIGSEGV, Segmentation fault.
0x0000000000000000 in ?? ()
(gdb) bt
#0  0x0000000000000000 in  ()
#1  0x0000555555eaceed in luxrays::OpenCLIntersectionDevice::Stop() ()
#2  0x0000555555eacf55 in luxrays::OpenCLIntersectionDevice::~OpenCLIntersectionDevice() ()
#3  0x0000555555eacf69 in luxrays::OpenCLIntersectionDevice::~OpenCLIntersectionDevice() ()
#4  0x0000555555e589ba in luxrays::Context::~Context() ()
#5  0x00005555559908c2 in slg::RenderEngine::~RenderEngine() ()
#6  0x0000555555b351ff in slg::PathOCLBaseRenderEngine::~PathOCLBaseRenderEngine() ()
#7  0x00005555559b97e9 in slg::PathOCLRenderEngine::~PathOCLRenderEngine() ()
#8  0x0000555555a998e5 in slg::RenderSession::~RenderSession() ()
#9  0x000055555593a665 in luxcore::detail::RenderSessionImpl::~RenderSessionImpl() ()
#10 0x000055555593a6a9 in luxcore::detail::RenderSessionImpl::~RenderSessionImpl() ()
#11 0x000055555591eb13 in LuxCoreRenderSession::Stop() ()
#12 0x000055555591eba5 in LuxCoreRenderSession::~LuxCoreRenderSession() ()
#13 0x00005555558efb4a in LuxMarkApp::Stop() ()
#14 0x00005555558efc66 in LuxMarkApp::InitRendering(LuxMarkAppMode, char const*) ()
#15 0x0000555555900802 in MainWindow::event(QEvent*) ()
#16 0x00007ffff270fc32 in QApplicationPrivate::notify_helper(QObject*, QEvent*) () at /lib/x86_64-linux-gnu/libQt5Widgets.so.5
#17 0x00007ffff2719190 in QApplication::notify(QObject*, QEvent*) () at /lib/x86_64-linux-gnu/libQt5Widgets.so.5
#18 0x00007ffff1ac3a52 in QCoreApplication::notifyInternal2(QObject*, QEvent*) () at /lib/x86_64-linux-gnu/libQt5Core.so.5
#19 0x00007ffff1ac6648 in QCoreApplicationPrivate::sendPostedEvents(QObject*, int, QThreadData*) () at /lib/x86_64-linux-gnu/libQt5Core.so.5
#20 0x00007ffff1b1a183 in  () at /lib/x86_64-linux-gnu/libQt5Core.so.5
#21 0x00007ffff0aa260d in g_main_context_dispatch () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#22 0x00007ffff0aa2890 in  () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#23 0x00007ffff0aa291f in g_main_context_iteration () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#24 0x00007ffff1b197c1 in QEventDispatcherGlib::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) () at /lib/x86_64-linux-gnu/libQt5Core.so.5
#25 0x00007ffff1ac26db in QEventLoop::exec(QFlags<QEventLoop::ProcessEventsFlag>) () at /lib/x86_64-linux-gnu/libQt5Core.so.5
#26 0x00007ffff1aca182 in QCoreApplication::exec() () at /lib/x86_64-linux-gnu/libQt5Core.so.5
#27 0x000055555588026e in main ()
(gdb) 

I am using AMD Radeon R9 Fury X with ROCm 3.5 installed and in my OpenCL OCL icd.

The LuxMark however does selects both Mesa Clover OpenCL implementation (which as of version Mesa 20.0.0 is actually working extremely well in other apps!) and also AMD APP opencl implementation.

I do not have pocl (CPU based OpenCL implementation) installed so that is not a problem. (And when trying to use it with --mode=BENCHMARK_OCL_CPU the luxmark report back an error that it can't find any CPU based OpenCL devices).

I tried also disabling devices, and it doesn't help:

user@debian:~/lux/luxmark$ ./luxmark.bin --scene=FOOD --mode=BENCHMARK_OCL_CUSTOM --devices=1
Segmentation fault
user@debian:~/lux/luxmark$ ./luxmark.bin --scene=FOOD --mode=BENCHMARK_OCL_CUSTOM --devices=00
Segmentation fault
user@debian:~/lux/luxmark$ ./luxmark.bin --scene=FOOD --mode=BENCHMARK_OCL_CUSTOM --devices=01
Segmentation fault
user@debian:~/lux/luxmark$ ./luxmark.bin --scene=FOOD --mode=BENCHMARK_OCL_CUSTOM --devices=10
Segmentation fault
user@debian:~/lux/luxmark$ ./luxmark.bin --scene=FOOD --mode=BENCHMARK_OCL_CUSTOM --devices=11
Segmentation fault

I can see (before the crash) proper devices being selected on the right side, but it still crashes.

My clinfo in the attachment.
clinfo.txt

I doubt Mesa OpenCL is able to run LuxMark. This is Mesa bug not a LuxMark. LuxMark has been downloaded and tested over hundred of thousands of OpenCL installations, for years (i.e. if it doesn't work, it is a OpenCL driver problem).
A rendering engine is 30,000-40,000 lines of OpenCL C code, it is nothing like the average OpenCL application (with few hundred of lines of OpenCL C code).

Just uninstall Mesa OpenCL, it is useless for you anyway, just use AMD OpenCL driver.

@Dade916 Indeed. Removing Mesa ICD did made the luxmark work and not crash. It works now with AMD ROCm OpenCL driver fine. And it also works fine with pocl (pthreads on CPU).

But I don't understand why it was crashing when I use

./luxmark.bin --scene=FOOD --mode=BENCHMARK_OCL_CUSTOM --devices=00

Yes, technically Mesa OpenCL is installed, but it is not used with these settings, yet I got this:

Thread 1 "luxmark.bin" received signal SIGSEGV, Segmentation fault.
0x00005555559b9a61 in slg::PathOCLRenderEngine::MergeThreadFilms() ()
(gdb) bt
#0  0x00005555559b9a61 in slg::PathOCLRenderEngine::MergeThreadFilms() ()
#1  0x00005555559b9b9f in slg::PathOCLRenderEngine::UpdateFilmLockLess() ()
#2  0x000055555599065a in slg::RenderEngine::Stop() ()
#3  0x0000555555a9993d in slg::RenderSession::~RenderSession() ()
#4  0x000055555593a665 in luxcore::detail::RenderSessionImpl::~RenderSessionImpl() ()
#5  0x000055555593a6a9 in luxcore::detail::RenderSessionImpl::~RenderSessionImpl() ()
#6  0x000055555591eb13 in LuxCoreRenderSession::Stop() ()
#7  0x000055555591eba5 in LuxCoreRenderSession::~LuxCoreRenderSession() ()
#8  0x00005555558efb4a in LuxMarkApp::Stop() ()
#9  0x00005555558efc66 in LuxMarkApp::InitRendering(LuxMarkAppMode, char const*) ()
#10 0x0000555555900802 in MainWindow::event(QEvent*) ()
#11 0x00007ffff270fc32 in QApplicationPrivate::notify_helper(QObject*, QEvent*) () at /lib/x86_64-linux-gnu/libQt5Widgets.so.5
#12 0x00007ffff2719190 in QApplication::notify(QObject*, QEvent*) () at /lib/x86_64-linux-gnu/libQt5Widgets.so.5
#13 0x00007ffff1ac3a52 in QCoreApplication::notifyInternal2(QObject*, QEvent*) () at /lib/x86_64-linux-gnu/libQt5Core.so.5
#14 0x00007ffff1ac6648 in QCoreApplicationPrivate::sendPostedEvents(QObject*, int, QThreadData*) () at /lib/x86_64-linux-gnu/libQt5Core.so.5
#15 0x00007ffff1b1a183 in  () at /lib/x86_64-linux-gnu/libQt5Core.so.5
#16 0x00007ffff0a6660d in g_main_context_dispatch () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#17 0x00007ffff0a66890 in  () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#18 0x00007ffff0a6691f in g_main_context_iteration () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#19 0x00007ffff1b197c1 in QEventDispatcherGlib::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) () at /lib/x86_64-linux-gnu/libQt5Core.so.5
#20 0x00007ffff1ac26db in QEventLoop::exec(QFlags<QEventLoop::ProcessEventsFlag>) () at /lib/x86_64-linux-gnu/libQt5Core.so.5
#21 0x00007ffff1aca182 in QCoreApplication::exec() () at /lib/x86_64-linux-gnu/libQt5Core.so.5
#22 0x000055555588026e in main ()

LuxCoreRender uses OpenCL (and now CUDA) for 2 things:

  1. rendering;
  2. image pipeline post processing (tone mapping, bloom, etc.).

With the "--devices" you are selecting the OpenCL devices to use for #1 but it doesn't affect #2. By default, the image pipeline code will use the first OpenCL device available and I guess, in your case, it is the Mesa one (as shown by clinfo).

In the past, we weren't using GPUs for the image pipeline and LuxMark has not been yet updated for that.

BTW, LuxMark development is on hold because we have recently introduced CUDA support and soon RTX support too. So I'm waiting to have RTX support before to further develop LuxMark in a generic OpenCL/CUDA/CUDA+RTX/CPU benchmark.

Thanks @Dade916. Now that makes sense.

I will work with Mesa devs on addressing issues in their OpenCL implementation (which did improve significantly in last year, but of course is not yet ready for general use). It was more of my attempt to test it than expecting it to work.

Thanks @Dade916. Now that makes sense.

I will work with Mesa devs on addressing issues in their OpenCL implementation (which did improve significantly in last year, but of course is not yet ready for general use). It was more of my attempt to test it than expecting it to work.

Does it crash if you remove the other OpenCL implementations? (you can use OCL_ICD_VENDORS env var if you have ocl-icd installed)
mesa OpenCL runs luxmark ok on all my machines with GCN+ hardware, other than low image accuracy.

edit: The above applies to luxmark 3.1. Luxmark 4.0-alpha needs

CLOVER_PLATFORM_VERSION_OVERRIDE="1.2" CLOVER_DEVICE_VERSION_OVERRIDE="1.2" CLOVER_DEVICE_CLC_VERSION_OVERRIDE="1.2" ./luxmark

because it uses CLC 1.2 constructs without checking for CLC 1.2 support.