Segmentation fault: rmw_wait with cyclone dds
samialperen opened this issue · 12 comments
Bug report
Required Info:
- Operating System:
Ubuntu 22, Docker environment - Installation type:
- Binaries
- Version or commit hash:
- n/a
- DDS implementation:
- cyclone
- Client library (if applicable):
- rclcpp
Steps to reproduce issue
I was not able to reproduce the issue reliably.
Expected behavior
I should not see any stack crash.
Actual behavior
My stack crashes
Stack trace (most recent call last) in thread 297:
#9 Object "", at 0xffffffffffffffff, in
#8 Object "/usr/lib/x86_64-linux-gnu/libc.so.6", at 0x7fc3ce7659ff, in
#7 Object "/usr/lib/x86_64-linux-gnu/libc.so.6", at 0x7fc3ce6d3b42, in
#6 Object "/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.30", at 0x7fc3ce9632b2, in
#5 Object "/my_ws/install/rclcpp/lib/librclcpp.so", at 0x7fc3cf651c7a, in rclcpp::executors::MultiThreadedExecutor::run(unsigned long)
#4 Object "/my_ws/install/rclcpp/lib/librclcpp.so", at 0x7fc3cf64c302, in rclcpp::Executor::get_next_executable(rclcpp::AnyExecutable&, std::chrono::duration<long, std::ratio<1l, 1000000000l> >)
#3 Object "/my_ws/install/rclcpp/lib/librclcpp.so", at 0x7fc3cf64b942, in rclcpp::Executor::wait_for_work(std::chrono::duration<long, std::ratio<1l, 1000000000l> >)
#2 Object "/opt/ros/humble/lib/librcl.so", at 0x7fc3cf4ca717, in rcl_wait
#1 Object "/opt/ros/humble/lib/librmw_cyclonedds_cpp.so", at 0x7fc3c696bb57, in rmw_wait
#0 Object "/opt/ros/humble/lib/librmw_cyclonedds_cpp.so", at 0x7fc3c6969416, in
Additional information
N/A
Feature request
Feature description
N/A
Implementation considerations
N/A
If you could come up with a more reproducible example, or build with debugging symbols so we can see where the crash is, that would help a lot with trying to debug this. There just isn't enough information here to do any meaningful debugging.
@clalancette is there debug information available somewhere for the binaries in a specific version of a binary installation? Because if there is, then at least we should be able to see with a bit more detail what line it crashed. It is a long shot, but sometimes it is enough.
Yeah, debug symbols are generally available by doing:
sudo apt-get install ros-humble-rclcpp-dbgsym ros-humble-rmw-cyclonedds-cpp-dbgsym
(etc)
Actually, that brings up another point; what version of ROS 2 are you on?
Thanks for the input guys. I am using humble.
I enabled general debug symbols with -g
flag while building.
@eboasson and @clalancette here is the updated stack trace after adding debug symbols for rclcpp and cyclone
Stack trace (most recent call last) in thread 304:
#10 Object "", at 0xffffffffffffffff, in
#9 Object "/usr/lib/x86_64-linux-gnu/libc.so.6", at 0x7f39741259ff, in
#8 Object "/usr/lib/x86_64-linux-gnu/libc.so.6", at 0x7f3974093b42, in
#7 Object "/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.30", at 0x7f39743232b2, in
#6 Source "rclcpp/rclcpp/src/rclcpp/executors/multi_threaded_executor.cpp", line 62, in spin [0x7f39750118b7]
59: }
60: }
61:
> 62: run(thread_id);
63: for (auto & thread : threads) {
64: thread.join();
65: }
#5 Source "rclcpp/rclcpp/src/rclcpp/executors/multi_threaded_executor.cpp", line 85, in operator new [0x7f3975011c7a]
82: if (!rclcpp::ok(this->context_) || !spinning.load()) {
83: return;
84: }
> 85: if (!get_next_executable(any_exec, next_exec_timeout_)) {
86: continue;
87: }
88: }
#4 Source "rclcpp/rclcpp/src/rclcpp/executor.cpp", line 906, in get_next_executable [0x7f397500c302]
903: // If there are none
904: if (!success) {
905: // Wait for subscriptions or timers to work on
> 906: wait_for_work(timeout);
907: if (!spinning.load()) {
908: return false;
909: }
#3 Source "rclcpp/rclcpp/src/rclcpp/executor.cpp", line 750, in wait_for_work [0x7f397500b942]
747: }
748:
749: rcl_ret_t status =
> 750: rcl_wait(&wait_set_, std::chrono::duration_cast<std::chrono::nanoseconds>(timeout).count());
751: if (status == RCL_RET_WAIT_SET_EMPTY) {
752: RCUTILS_LOG_WARN_NAMED(
753: "rclcpp",
#2 Object "/opt/ros/humble/lib/librcl.so", at 0x7f3974e8a717, in rcl_wait
#1 Source "./src/rmw_node.cpp", line 4067, in rmw_wait [0x7f396c32bb57]
#0 Source "./src/rmw_node.cpp", line 3958, in waitset_detach [0x7f396c329416]
Segmentation fault (Signal sent by the kernel [(nil)])
Interesting: that line (
) is almost identical to what came before and I suspect that it means it crashed on the dereference ofsub->rdcondh
. (If I had my computer at hand I would probably study the disassembly to be sure.)
It is a bit of a long shot, but could it be that you’re deleting a client in thread A while thread B is waiting on, among other things, that client?
I know that way back when I made some assumptions as to how waitsets would be used in ROS, and it is definitely possible I guessed wrong.
@samialperen Where you able to fix this issue? I'm having the exact same issue with ROS Iron.
@Linbreux any chance that you can find out whether my "long shot" about the application deleting a client in another thread could be true?
@Linbreux @eboasson Well I could not figure out exactly what was causing the issue in our code base, but we ended up changing the way we call executors that fixed the problem in our end. I wish I could pinpoint the error in our codebase , but another fix kinda solved our problem indirectly.
We have been using humble by the way.
We are still experiencing the same problem in humble with single threaded node. It happens randomly. @samialperen how were you able to fix this. How did you call executors to solve this problem.
Stack trace (most recent call last) in thread 67:
#15 Object "", at 0xffffffffffffffff, in
#14 Object "/usr/lib/x86_64-linux-gnu/libc.so.6", at 0x7b030861aa03, in __clone
#13 Object "/usr/lib/x86_64-linux-gnu/libc.so.6", at 0x7b0308589ac2, in
#12 Object "/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.30", at 0x7b030881c252, in
#11 Object "/umd2_ws/install/umd_utils/lib/libumd_utils.so", at 0x7b03089794b5, in std::thread::_State_impl<std::thread::_Invoker<std::tuple<umd_utils::NodeThread::NodeThread(std::shared_ptr<rclcpp::node_interfaces::NodeBaseInterface>)::{lambda()#1}> > >::_M_run()
#10 Object "/umd2_ws/install/umd_utils/lib/libumd_utils.so", at 0x7b03089794ed, in std::thread::_Invoker<std::tuple<umd_utils::NodeThread::NodeThread(std::shared_ptr<rclcpp::node_interfaces::NodeBaseInterface>)::{lambda()#1}> >::operator()()
#9 Object "/umd2_ws/install/umd_utils/lib/libumd_utils.so", at 0x7b0308979545, in void std::thread::_Invoker<std::tuple<umd_utils::NodeThread::NodeThread(std::shared_ptr<rclcpp::node_interfaces::NodeBaseInterface>)::{lambda()#1}> >::_M_invoke<0ul>(std::_Index_tuple<0ul>)
#8 Object "/umd2_ws/install/umd_utils/lib/libumd_utils.so", at 0x7b03089795eb, in std::__invoke_result<umd_utils::NodeThread::NodeThread(std::shared_ptr<rclcpp::node_interfaces::NodeBaseInterface>)::{lambda()#1}>::type std::__invoke<umd_utils::NodeThread::NodeThread(std::shared_ptr<rclcpp::node_interfaces::NodeBaseInterface>)::{lambda()#1}>(umd_utils::NodeThread::NodeThread(std::shared_ptr<rclcpp::node_interfaces::NodeBaseInterface>)::{lambda()#1}&&)
#7 Object "/umd2_ws/install/umd_utils/lib/libumd_utils.so", at 0x7b0308979665, in void std::__invoke_impl<void, umd_utils::NodeThread::NodeThread(std::shared_ptr<rclcpp::node_interfaces::NodeBaseInterface>)::{lambda()#1}>(std::__invoke_other, umd_utils::NodeThread::NodeThread(std::shared_ptr<rclcpp::node_interfaces::NodeBaseInterface>)::{lambda()#1}&&)
#6 Object "/umd2_ws/install/umd_utils/lib/libumd_utils.so", at 0x7b0308978ab0, in umd_utils::NodeThread::NodeThread(std::shared_ptr<rclcpp::node_interfaces::NodeBaseInterface>)::{lambda()#1}::operator()() const
#5 Object "/opt/ros/humble/lib/librclcpp.so", at 0x7b0308a89970, in rclcpp::executors::SingleThreadedExecutor::spin()
#4 Object "/opt/ros/humble/lib/librclcpp.so", at 0x7b0308a82492, in rclcpp::Executor::get_next_executable(rclcpp::AnyExecutable&, std::chrono::duration<long, std::ratio<1l, 1000000000l> >)
#3 Object "/opt/ros/humble/lib/librclcpp.so", at 0x7b0308a7f70b, in rclcpp::Executor::wait_for_work(std::chrono::duration<long, std::ratio<1l, 1000000000l> >)
#2 Object "/opt/ros/humble/lib/librcl.so", at 0x7b0308353847, in rcl_wait
#1 Object "/opt/ros/humble/lib/librmw_cyclonedds_cpp.so", at 0x7b0307dfe87a, in rmw_wait
#0 Object "/opt/ros/humble/lib/librmw_cyclonedds_cpp.so", at 0x7b0307dfc396, in
Segmentation fault (Signal sent by the kernel [(nil)])
[ros2run]: Segmentation fault
@clalancette would appreciate the help, havving the same issue.