Segfault 11 on Mac with OpenCL Pipeline
mahimna opened this issue · 5 comments
Overview Description:
When closing the device, there is a EXC_BAD_ACCESS
error when destructing the OpenCLFrames.
Version, Platform, and Hardware Bug Found: This is on Mac 12.10.5
Stack Trace:
Crashed Thread: 0 Dispatch queue: com.apple.main-thread
Exception Type: EXC_BAD_ACCESS (SIGSEGV)
Exception Codes: KERN_INVALID_ADDRESS at 0x0000000000000008
Exception Note: EXC_CORPSE_NOTIFY
Termination Signal: Segmentation fault: 11
Termination Reason: Namespace SIGNAL, Code 0xb
Terminating Process: exc handler [0]
VM Regions Near 0x8:
-->
__TEXT 0000000100000000-0000000100002000 [ 8K] r-x/rwx SM=COW /Users/USER//.7
Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0 libfreenect2.0.2.dylib 0x000000010c08651f libfreenect2::OpenCLFrame::~OpenCLFrame() + 31 (opencl_depth_packet_processor.cpp:157)
1 libfreenect2.0.2.dylib 0x000000010c064abf libfreenect2::SyncMultiFrameListener::release(std::__1::map<libfreenect2::Frame::Type, libfreenect2::Frame*, std::__1::lesslibfreenect2::Frame::Type, std::__1::allocator<std::__1::pair<libfreenect2::Frame::Type const, libfreenect2::Frame*> > >&) + 47 (frame_listener_impl.cpp:152)
2 libfreenect2.0.2.dylib 0x000000010c064a21 libfreenect2::SyncMultiFrameListener::~SyncMultiFrameListener() + 33 (frame_listener_impl.cpp:97)
3 libfreenect2.0.2.dylib 0x000000010c064b7f libfreenect2::SyncMultiFrameListener::~SyncMultiFrameListener() + 15 (frame_listener_impl.cpp:95)
4 libfreenect2.so 0x000000010c0338c5 __pyx_tp_dealloc_14pylibfreenect2_12libfreenect2_SyncMultiFrameListener(_object*) + 53
curious why this was closed?
I still see this issue with master. Only way around it is not to delete the listener which is obviously not good :)
the protonect example doesn't use a pointer for SyncMultiFrameListener but you have to use a pointer if you want to have your code not all in a single function like the protonect example.
actually I see this issue with both SyncMultiFrameListener as a pointer and non-pointer as in the the protonect example. Looks like a bug with OpenCLFrame destructor.
It was closed because there was no chance of getting this fixed. I'm open to bug fixes but I can't personally test it as a Mac is not in my possession.
I think there may be a double free here, but again I can't test it. A way of confirming that is by adding printfs at appropriate locations to find out which variable contains invalid address and is being dereferenced. And then find out where its valid is invalidated.
The segfault is from buffer->allocator->free(buffer);
. Is buffer
a nullptr here or is buffer->allocator
invalid? The latter being more likely. allocator
is owned by OpenCLDepthPacketProcessorImpl as input_buffer_allocator etc, which is owned by OpenCLDepthPacketProcessor, owned by OpenCLPacketPipeline, owned by Freenect2DeviceImpl, and freed when delete dev
is called.
So apparently according to the current code you can't delete listener
after deleting device
, because listener
is still holding a frame which references data inside device
. You can't delete listener
before dev->stop()
because dev
may push a new frame into listener
. How about deleting listener after dev->stop()
but before delete dev
?
It is all very horribly written C++ code but that's what we have now.
Edit: It looks like it's safer to delete listener after dev->close() but before delete dev.
@xlz wow, thats all really helpful!
I am happy to help debug this as the OpenCL implementation is our default in our ofxKinectV2 addon and would be good to get working. ( FYI the crash on exit also seems to happen on Windows too. )
I think your hunch might be right.
I did try many if nullptr checks on buffer and buffer->allocator, but what you describe would explain the behavior we're seeing. 🤞
I'll try deleting the listener after dev->stop() but before delete.
I'll follow up soon.
Thanks!
Theo
Hi @xlz
So I can confirm that switching the shutdown order to:
from:
dev->stop();
dev->close();
delete dev;
delete listener;
to:
dev->stop();
dev->close();
delete listener;
delete dev;
Fixes the OpenCL crash mentioned above.
Appreciate the help debugging this. Not sure if this should be closed or if there is a way to document or fix this in the code.
Thanks!