NOSALRO/robot_dart

OpenGL parallel contexts with shadows segfault (sometimes)

Opened this issue · 1 comments

When rendering shadows in multiple OpenGL parallel contexts, we sometimes get segfaults. Need to investigate this. Without the shadows, this does not happen. Example backtrace:

(gdb) bt
#0  0x00007ffe42c6b485 in  () at /usr/lib/libnvidia-glcore.so.465.31
#1  0x00007ffff7b60610 in  () at /usr/lib/libMagnumGL.so.2
#2  0x00007ffff7b5f5c1 in Magnum::GL::Mesh::drawInternal(int, int, int, unsigned int, long, int, int) () at /usr/lib/libMagnumGL.so.2
#3  0x00007ffff7b47258 in Magnum::GL::AbstractShaderProgram::draw(Magnum::GL::Mesh&) () at /usr/lib/libMagnumGL.so.2
#4  0x00005555555ba654 in robot_dart::gui::magnum::ShadowedObject::draw(Magnum::Math::Matrix4<float> const&, Magnum::SceneGraph::Camera<3u, float>&) (this=0x7ffe34cb1b10, transformationMatrix=..., camera=...)
    at ../src/robot_dart/gui/magnum/drawables.cpp:176
#5  0x00007ffff7c6ca7f in Magnum::SceneGraph::Camera<3u, float>::draw(Magnum::SceneGraph::FeatureGroup<3u, Magnum::SceneGraph::Drawable<3u, float>, float>&) () at /usr/lib/libMagnumSceneGraph.so.2
#6  0x00005555555b4127 in robot_dart::gui::magnum::BaseApplication::render_shadows() (this=0x7ffe343c1480) at ../src/robot_dart/gui/magnum/base_application.cpp:574
#7  0x00005555555b630d in robot_dart::gui::magnum::BaseApplication::update_lights(robot_dart::gui::magnum::gs::Camera const&) (this=0x7ffe343c1480, camera=<optimized out>) at ../src/robot_dart/gui/magnum/base_application.cpp:276
#8  0x00005555555db14c in robot_dart::gui::magnum::sensor::Camera::calculate(double) (this=0x7ffe37883300) at /usr/include/c++/11.1.0/bits/unique_ptr.h:173
#9  0x00005555555e0c53 in robot_dart::RobotDARTSimu::step_world(bool) (this=this@entry=0x7ffe41925b60, reset_commands=reset_commands@entry=false) at ../src/robot_dart/robot_dart_simu.cpp:168
#10 0x00005555555958da in operator()(int) const (__closure=0x7fffffffd580, run=<optimized out>) at ../src/task_specific_evaluation.cpp:354
#11 0x0000555555596fcd in tbb::internal::parallel_for_body<main(int, char**)::<lambda(int)>, int>::operator() (r=<optimized out>, r=<optimized out>, this=0x7ffe44c4fd58) at /usr/include/tbb/parallel_for.h:177
#12 tbb::interface9::internal::start_for<tbb::blocked_range<int>, tbb::internal::parallel_for_body<main(int, char**)::<lambda(int)>, int>, const tbb::auto_partitioner>::run_body (r=<optimized out>, this=0x7ffe44c4fd40)
    at /usr/include/tbb/parallel_for.h:115
#13 tbb::interface9::internal::dynamic_grainsize_mode<tbb::interface9::internal::adaptive_mode<tbb::interface9::internal::auto_partition_type> >::work_balance<tbb::interface9::internal::start_for<tbb::blocked_range<int>, tbb::internal::parallel_for_body<main(int, char**)::<lambda(int)>, int>, const tbb::auto_partitioner>, tbb::blocked_range<int> > (range=<optimized out>, start=<optimized out>, this=<optimized out>) at /usr/include/tbb/partitioner.h:423
#14 tbb::interface9::internal::partition_type_base<tbb::interface9::internal::auto_partition_type>::execute<tbb::interface9::internal::start_for<tbb::blocked_range<int>, tbb::internal::parallel_for_body<main(int, char**)::<lambda(int)>, int>, const tbb::auto_partitioner>, tbb::blocked_range<int> > (range=<optimized out>, start=warning: RTTI symbol not found for class 'tbb::interface9::internal::start_for<tbb::blocked_range<int>, tbb::internal::parallel_for_body<main::{lambda(int)#1}, int>, tbb::auto_partitioner const>'
..., this=0x7ffe44c4fd68) at /usr/include/tbb/partitioner.h:256
#15 tbb::interface9::internal::start_for<tbb::blocked_range<int>, tbb::internal::parallel_for_body<main(int, char**)::<lambda(int)>, int>, const tbb::auto_partitioner>::execute(void) (this=0x7ffe44c4fd40)
    at /usr/include/tbb/parallel_for.h:142
#16 0x00007ffff25c5105 in  () at /usr/lib/libtbb.so.2
#17 0x00007ffff25c543c in  () at /usr/lib/libtbb.so.2
#18 0x00007ffff25bed97 in  () at /usr/lib/libtbb.so.2
#19 0x00007ffff25bd3e1 in  () at /usr/lib/libtbb.so.2
#20 0x00007ffff25b981c in  () at /usr/lib/libtbb.so.2
#21 0x00007ffff25b9a8a in  () at /usr/lib/libtbb.so.2
#22 0x00007ffff6064259 in start_thread () at /usr/lib/libpthread.so.0
#23 0x00007fffe6c585e3 in clone () at /usr/lib/libc.so.6

This is most probably because GPU memory is not enough to handle all the parallel contexts (and shadows DO take a lot of GPU memory).

I might want to check GL::Renderer::error() for out of memory messages, but this is not guaranteed to fire every time.