Eyescale/Equalizer

PixelBufferObject (glMapBuffer) rendering error in RHEL

Closed this issue · 5 comments

Using RHEL 6.3 on one node, using a config with 2 Nodes and 2 gpus per node (multi-process, DirectSend Spatial DB).
config file: https://gist.github.com/4039942

PixelBufferObject::mapRead deadlocks.

Thread 3 (Thread 0x2aaaae47c700 (LWP 6934)):
#0  0x00002b46a7faf419 in ?? () from /usr/lib64/nvidia/libGL.so.1
#1  0x00002b46a7faf443 in ?? () from /usr/lib64/nvidia/libGL.so.1
#2  0x00002b46b1cdf927 in ?? () from /usr/lib64/nvidia/libnvidia-glcore.so.295.41
#3  0x00002b46b1c5c049 in ?? () from /usr/lib64/nvidia/libnvidia-glcore.so.295.41
#4  0x00002b46b1bfd9be in ?? () from /usr/lib64/nvidia/libnvidia-glcore.so.295.41
#5  0x00002b46b1bfdb2b in ?? () from /usr/lib64/nvidia/libnvidia-glcore.so.295.41
#6  0x00002b46b19c3699 in ?? () from /usr/lib64/nvidia/libnvidia-glcore.so.295.41
#7  0x00002b46adc56263 in eq::util::detail::PixelBufferObject::mapRead (this=0x2b46ec2bdcf0) at /home/bohara/Buildyard/src/Equalizer/libs/eq/util/pixelBufferObject.cpp:140
#8  0x00002b46adc545a6 in eq::util::PixelBufferObject::mapRead (this=0x2b46ec2bd4a0) at /home/bohara/Buildyard/src/Equalizer/libs/eq/util/pixelBufferObject.cpp:240
#9  0x00002b46adc6892e in eq::plugin::CompressorReadDrawPixels::finishDownload (this=0x2b46ec2bd300, glewContext=0x2aaab00012f0, inDims=0x2aaaae47b780, flags=340, outDims=0x2aaaae47b760, out=0x2b46ec2b6f38)
    at /home/bohara/Buildyard/src/Equalizer/libs/eq/client/compressor/compressorReadDrawPixels.cpp:487
#10 0x00002b46adc60def in EqCompressorFinishDownload (ptr=0x2b46ec2bd300, name=257, glewContext=0x2aaab00012f0, inDims=0x2aaaae47b780, flags=340, outDims=0x2aaaae47b760, out=0x2b46ec2b6f38)
    at /home/bohara/Buildyard/src/Equalizer/libs/eq/client/compressor/compressor.cpp:245
#11 0x00002b46adc3617b in eq::util::GPUCompressor::finishDownload (this=0x2b46ec2b6e20, pvpIn=..., flags=84, pvpOut=..., out=0x2b46ec2b6f38)
    at /home/bohara/Buildyard/src/Equalizer/libs/eq/util/gpuCompressor.cpp:181
#12 0x00002b46adb6cba1 in eq::Image::_finishReadback (this=0x2b46ec2b7420, buffer=eq::fabric::Frame::BUFFER_COLOR, zoom=..., glewContext=0x2aaab00012f0)
    at /home/bohara/Buildyard/src/Equalizer/libs/eq/client/image.cpp:672
#13 0x00002b46adb6c506 in eq::Image::finishReadback (this=0x2b46ec2b7420, zoom=..., glewContext=0x2aaab00012f0) at /home/bohara/Buildyard/src/Equalizer/libs/eq/client/image.cpp:624
#14 0x00002b46adabfbe1 in eq::Channel::_finishReadback (this=0x2b46ec45e6b0, frameDataVersion=..., imageIndex=0, frameNumber=5, taskID=4, nodes=std::vector of length 1, capacity 1 = {...}, 
    netNodes=std::vector of length 1, capacity 1 = {...}) at /home/bohara/Buildyard/src/Equalizer/libs/eq/client/channel.cpp:1633
#15 0x00002b46adac74b4 in eq::Channel::_cmdFinishReadback (this=0x2b46ec45e6b0, cmd=...) at /home/bohara/Buildyard/src/Equalizer/libs/eq/client/channel.cpp:2163
#16 0x00002b46ae16cc6b in co::CommandFunc<co::Dispatcher>::operator() (this=0x2aaaae47bb90, command=...) at /home/bohara/Buildyard/src/Collage/co/commandFunc.h:60
#17 0x00002b46ae16bfaf in co::ICommand::operator() (this=0x2aaaae47bbd0) at /home/bohara/Buildyard/src/Collage/co/iCommand.cpp:214
#18 0x00002b46ae1fe0f8 in co::WorkerThread<co::CommandQueue>::run (this=0x2b46cc018310) at /home/bohara/Buildyard/src/Collage/co/worker.ipp:32
#19 0x00002b46ae528203 in lunchbox::Thread::_runChild (this=0x2b46cc018310) at /home/bohara/Buildyard/src/Lunchbox/lunchbox/thread.cpp:140
#20 0x00002b46ae527c5a in lunchbox::Thread::runChild (arg=0x2b46cc018310) at /home/bohara/Buildyard/src/Lunchbox/lunchbox/thread.cpp:117
#21 0x00002b46a7fb7b74 in ?? () from /usr/lib64/nvidia/libGL.so.1
#22 0x00002b46affc1851 in start_thread () from /lib64/libpthread.so.0
#23 0x00002b46b02bf11d in clone () from /lib64/libc.so.6
eile commented

Plan of action:

  • Write single-threaded unit test to reproduce deadlock
  • Extend to multi-threaded version if not deadlocking
eile commented

May be related to #177.

eile commented

Likely a duplicate of #177.

eile commented

Closing, if it still exists let me know.

Sure, I will test it today to verify in cluster.