moveit/moveit2

FCL collision checking crashes during motion

Closed this issue · 17 comments

Description

In some situations (I haven't been able to find a common cause), the move group crashes when executing a motion. Backtracing with gdb suggests that the crash is occuring within the FCL collision checker.

ROS Distro

Humble

OS and version

Ubuntu 22.04 (via docker)

Source or binary build?

Both
(source for moveit_core and pilz_industrial_motion_planner, binaries for everything else)

If binary, which release version?

2.5.5
(for everything besides moveit_core and pilz_industrial_motion_planner)

If source, which branch?

https://github.com/rr-tom-noble/moveit2/tree/bugfix/pilz-fix (humble plus #3077, #3055, #3070)
(only for moveit_core and pilz_industrial_motion_planner)

Which RMW are you using?

FastRTPS

Steps to Reproduce

Unsure, as the error appears to be sporadic. I'll update with details as I investigate.

Expected behavior

Valid motion plans should not crash the move group

Actual behavior

When attempting a motion, the move group (sometimes) crashes. Backtracing with gdb suggests that this is happening within the FCL collision checker.

Backtrace or Console output

Image

Some additional findings:

  • Our robot performs a move on start-up. This motion sometimes crashes and sometimes succeeds.
    • The crash seems to be related to collision objects, so I wonder if the success case occurs when the robot moves before the object have loaded
    • I believe our xacro contains some typos in names when disabling collisions. I wonder if the collision checker is looking up an object that doesn't exist, and dereferencing a nullptr somewhere. I'll fix the typos tomorrow and test.
  • When we manually attempt to move the robot to a pose, the motion consistently crashes.
  • I don't have the logs to hand at the moment, but there was a message indicating the plan succeeded before the crash, so it may be execution that's crashing

I just checked and the typos in collision disabling have been fixed, so that rules out that as a cause.

I'm fairly confident the issue in in these lines, since the backtrace suggests the crash occurs within std::_Rb_tree::find(), which is the underlying type of std::set. active_components_only_ uses a std::set.

These are the logs leading up to the failure:

[ns_motion_motion_server-30] [INFO] [1731398504.992854780] [move_group_interface]: MoveGroup action client/server ready
[move_group-2] [INFO] [1731398504.993094281] [moveit_move_group_default_capabilities.move_action_capability]: Received request
[move_group-2] [INFO] [1731398504.993187273] [moveit_move_group_default_capabilities.move_action_capability]: executing..
[ns_motion_motion_server-30] [INFO] [1731398504.993182201] [move_group_interface]: Planning request accepted
[move_group-2] [INFO] [1731398505.025682103] [moveit_move_group_default_capabilities.move_action_capability]: Planning request received for MoveGroup action. Forwarding to planning pipeline.
[move_group-2] [INFO] [1731398505.025921357] [moveit_move_group_capabilities_base.move_group_capability]: Using planning pipeline 'pilz_industrial_motion_planner'
[move_group-2] [INFO] [1731398505.025978619] [moveit.pilz_industrial_motion_planner.trajectory_generator_ptp]: Initialized Point-to-Point Trajectory Generator.
[move_group-2] [INFO] [1731398505.025998497] [moveit.pilz_industrial_motion_planner.trajectory_generator]: Generating PTP trajectory...
[move_group-2] [INFO] [1731398505.026005017] [moveit.pilz_industrial_motion_planner.trajectory_generator]: Validating request...
[move_group-2] [INFO] [1731398505.026015528] [moveit.pilz_industrial_motion_planner.trajectory_generator]: Request validated.
[move_group-2] [INFO] [1731398505.026020302] [moveit.pilz_industrial_motion_planner.trajectory_generator]: Validating command specific request...
[move_group-2] [INFO] [1731398505.026024897] [moveit.pilz_industrial_motion_planner.trajectory_generator]: Command specific request validated.
[move_group-2] [INFO] [1731398505.026088138] [moveit.pilz_industrial_motion_planner.trajectory_generator]: Extracting motion plan info...
[move_group-2] [INFO] [1731398505.026095275] [moveit.pilz_industrial_motion_planner.trajectory_generator]: Motion plan info extracted.
[move_group-2] [INFO] [1731398505.026099727] [moveit.pilz_industrial_motion_planner.trajectory_generator]: Planning trajectory...
[move_group-2] [INFO] [1731398505.026106010] [moveit.pilz_industrial_motion_planner.trajectory_generator_ptp]: Goal already reached, set one goal point explicitly.
[move_group-2] [INFO] [1731398505.026113184] [moveit.pilz_industrial_motion_planner.trajectory_generator]: Trajectory planned.
[move_group-2] [INFO] [1731398505.026253898] [moveit_move_group_default_capabilities.move_action_capability]: Motion plan was computed successfully.
[ns_motion_motion_server-30] [INFO] [1731398505.026461550] [move_group_interface]: Planning request complete!
[ns_motion_motion_server-30] [INFO] [1731398505.026839870] [move_group_interface]: time taken to generate plan: 0.000130243 seconds
[move_group-2] [INFO] [1731398505.027109540] [moveit_ros.trajectory_execution_manager]: Received event 'stop'
[move_group-2] [INFO] [1731398505.059315208] [moveit_move_group_default_capabilities.execute_trajectory_action_capability]: Received goal request
[move_group-2] [INFO] [1731398505.059413687] [moveit_move_group_default_capabilities.execute_trajectory_action_capability]: Execution request received
[move_group-2] [INFO] [1731398505.059433556] [moveit.plugins.moveit_simple_controller_manager]: Returned 2 controllers in list
[move_group-2] [INFO] [1731398505.059449191] [moveit.plugins.moveit_simple_controller_manager]: Returned 2 controllers in list
[move_group-2] [INFO] [1731398505.059493330] [moveit_ros.trajectory_execution_manager]: Validating trajectory with allowed_start_tolerance 0.01
[ns_motion_motion_server-30] [INFO] [1731398505.059393960] [move_group_interface]: Execute request accepted
[move_group-2] [INFO] [1731398505.092037996] [moveit_ros.trajectory_execution_manager]: Starting trajectory execution ...
[move_group-2] [INFO] [1731398505.092065627] [moveit.plugins.moveit_simple_controller_manager]: Returned 2 controllers in list
[move_group-2] [INFO] [1731398505.092078071] [moveit.plugins.moveit_simple_controller_manager]: Returned 2 controllers in list
[move_group-2] [INFO] [1731398505.092422814] [moveit.simple_controller_manager.follow_joint_trajectory_controller_handle]: sending trajectory to fake_manipulator_controller
[ros2_control_node-4] [INFO] [1731398505.092516203] [fake_manipulator_controller]: Received new action goal
[ros2_control_node-4] [INFO] [1731398505.092557186] [fake_manipulator_controller]: Accepted new action goal
[move_group-2] [INFO] [1731398505.092727880] [moveit.simple_controller_manager.follow_joint_trajectory_controller_handle]: fake_manipulator_controller started execution
[move_group-2] [INFO] [1731398505.092740935] [moveit.simple_controller_manager.follow_joint_trajectory_controller_handle]: Goal request accepted!
[ros2_control_node-4] [INFO] [1731398505.258340515] [fake_manipulator_controller]: Goal reached, success!
[move_group-2] [INFO] [1731398505.292939760] [moveit.simple_controller_manager.follow_joint_trajectory_controller_handle]: Controller 'fake_manipulator_controller' successfully finished
[move_group-2] [INFO] [1731398505.392107285] [moveit_ros.trajectory_execution_manager]: Completed trajectory execution with status SUCCEEDED ...
[move_group-2] [INFO] [1731398505.392147886] [moveit_move_group_default_capabilities.execute_trajectory_action_capability]: Execution completed: SUCCEEDED
[ns_motion_motion_server-30] [INFO] [1731398505.392399971] [move_group_interface]: Execute request success!
[move_group-2] [INFO] [1731398505.575211349] [moveit_ros.trajectory_execution_manager]: Received event 'stop'
[move_group-2] [INFO] [1731398505.592394086] [moveit_move_group_default_capabilities.execute_trajectory_action_capability]: Received goal request
[move_group-2] [INFO] [1731398505.592457884] [moveit_move_group_default_capabilities.execute_trajectory_action_capability]: Execution request received
[move_group-2] [INFO] [1731398505.592475875] [moveit.plugins.moveit_simple_controller_manager]: Returned 2 controllers in list
[move_group-2] [INFO] [1731398505.592502318] [moveit.plugins.moveit_simple_controller_manager]: Returned 2 controllers in list
[move_group-2] [INFO] [1731398505.592531202] [moveit_ros.trajectory_execution_manager]: Validating trajectory with allowed_start_tolerance 0.01
[ns_motion_motion_server-30] [INFO] [1731398505.592511270] [move_group_interface]: Execute request accepted
[move_group-2] [INFO] [1731398505.625645950] [moveit_ros.trajectory_execution_manager]: Starting trajectory execution ...
[move_group-2] [INFO] [1731398505.625676937] [moveit.plugins.moveit_simple_controller_manager]: Returned 2 controllers in list
[move_group-2] [INFO] [1731398505.625686477] [moveit.plugins.moveit_simple_controller_manager]: Returned 2 controllers in list
[move_group-2] [INFO] [1731398505.626042684] [moveit.simple_controller_manager.follow_joint_trajectory_controller_handle]: sending trajectory to fake_manipulator_controller
[ros2_control_node-4] [INFO] [1731398505.626138961] [fake_manipulator_controller]: Received new action goal
[ros2_control_node-4] [INFO] [1731398505.626179536] [fake_manipulator_controller]: Accepted new action goal
[move_group-2] [INFO] [1731398505.626258316] [moveit.simple_controller_manager.follow_joint_trajectory_controller_handle]: fake_manipulator_controller started execution
[move_group-2] [INFO] [1731398505.626268719] [moveit.simple_controller_manager.follow_joint_trajectory_controller_handle]: Goal request accepted!
[ros2_control_node-4] [INFO] [1731398505.691653330] [fake_manipulator_controller]: Goal reached, success!
[move_group-2] [INFO] [1731398505.726385311] [moveit.simple_controller_manager.follow_joint_trajectory_controller_handle]: Controller 'fake_manipulator_controller' successfully finished
[move_group-2] [INFO] [1731398505.825666208] [moveit_ros.trajectory_execution_manager]: Completed trajectory execution with status SUCCEEDED ...
[move_group-2] [INFO] [1731398505.825708487] [moveit_move_group_default_capabilities.execute_trajectory_action_capability]: Execution completed: SUCCEEDED
[ns_motion_motion_server-30] [INFO] [1731398505.825938128] [move_group_interface]: Execute request success!
[ERROR] [move_group-2]: process has died [pid 63, exit code -11, cmd '/opt/ros/humble/lib/moveit_ros_move_group/move_group --ros-args -r __ns:=/move_group --params-file /tmp/launch_params_bnavf_bb --params-file /tmp/launch_params_m7qcuz37 --params-file /tmp/launch_params_42mjwpxx --params-file /tmp/launch_params_1u87_tmt --params-file /tmp/launch_params_c8b4stge --params-file /tmp/launch_params_ebym3ekg --params-file /tmp/launch_params_hai0u7e8 --params-file /tmp/launch_params_ky6e47uj --params-file /tmp/launch_params_67fn9m32 --params-file /tmp/launch_params_3ev073xe'].
[move_group-2] 

Ah actually, our version of moveit builds moveit_core and pilz_industrial_motion_planner from source, with some fixes from recent PRs. The branch is https://github.com/rr-tom-noble/moveit2/tree/bugfix/pilz-fix, which is based on humble plus #3077, #3055, #3070. None of those PRs touch collision checking and I believe FCL runs after the motion has been planned, so I think this represents a new issue we're running into now that those PRs have fixed our usage of pilz and the plan is reaching the collision checking stage

Backtrace with line numbers:


Thread 22 "move_group" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f7a7fe00640 (LWP 672)]
0x00007f7aa4b8cd79 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) 
(gdb) bt
#0  0x00007f7aa4b8cd79 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007f7aa4271fd7 in std::char_traits<char>::compare (
    __n=<optimized out>, __s2=<optimized out>, 
    __s1=0x6177616b736179 <error: Cannot access memory at address 0x6177616b736179>) at /usr/include/c++/11/bits/char_traits.h:389
#2  std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::compare (this=0x556107bbe0c0, __str="diagonal_wall", this=0x556107bbe0c0, 
    __str="diagonal_wall") at /usr/include/c++/11/bits/basic_string.h:2879
#3  std::operator< <char, std::char_traits<char>, std::allocator<char> > (
    __rhs="diagonal_wall", 
    __lhs=<error: Cannot access memory at address 0x6177616b736179>)
    at /usr/include/c++/11/bits/basic_string.h:6343
#4  std::operator< <std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > (__y={...}, __x={...})
    at /usr/include/c++/11/bits/stl_pair.h:490
#5  std::less<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >::operator() (__y={...}, __x={...}, 
    this=0x7f7a7fdfecf8) at /usr/include/c++/11/bits/stl_function.h:400
#6  std::_Rb_tree<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::pair<std::pair<std::__cxx11::basic_string<c--Type <RET> for more, q to quit, c to continue without paging--
har, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const, std::vector<collision_detection::Contact, std::allocator<collision_detection::Contact> > >, std::_Select1st<std::pair<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const, std::vector<collision_detection::Contact, std::allocator<collision_detection::Contact> > > >, std::less<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::allocator<std::pair<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const, std::vector<collision_detection::Contact, std::allocator<collision_detection::Contact> > > > >::_M_lower_bound (
    __k={...}, __y=0x7f7a7fdfed00, __x=0x556107bbe0a0, this=0x7f7a7fdfecf8)
    at /usr/include/c++/11/bits/stl_tree.h:1905
#7  std::_Rb_tree<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::pair<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const, std::vector<collision_detection::Contact, std::allocator<collision_detection::Contact> > >, std::_Select1st<std::pair<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits--Type <RET> for more, q to quit, c to continue without paging--
<char>, std::allocator<char> > > const, std::vector<collision_detection::Contact, std::allocator<collision_detection::Contact> > > >, std::less<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::allocator<std::pair<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const, std::vector<collision_detection::Contact, std::allocator<collision_detection::Contact> > > > >::find (
    this=this@entry=0x7f7a7fdfecf8, __k={...})
    at /usr/include/c++/11/bits/stl_tree.h:2523
#8  0x00007f7aa426df29 in std::map<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::vector<collision_detection::Contact, std::allocator<collision_detection::Contact> >, std::less<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::allocator<std::pair<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const, std::vector<collision_detection::Contact, std::allocator<collision_detection::Contact> > > > >::find (
    __x={...}, this=0x7f7a7fdfecf8) at /usr/include/c++/11/bits/stl_map.h:1170
#9  collision_detection::collisionCallback (o1=<optimized out>, 
    o2=0x7f7a38002000, data=0x7f7a7fdfe960)
--Type <RET> for more, q to quit, c to continue without paging--
    at /root/workspace/src/moveit2/moveit_core/collision_detection_fcl/src/collision_common.cpp:167
#10 0x00007f7aa34a185e in ?? () from /lib/x86_64-linux-gnu/libfcl.so.0.7
#11 0x00007f7aa34a185e in ?? () from /lib/x86_64-linux-gnu/libfcl.so.0.7
#12 0x00007f7aa34a185e in ?? () from /lib/x86_64-linux-gnu/libfcl.so.0.7
#13 0x00007f7aa34a185e in ?? () from /lib/x86_64-linux-gnu/libfcl.so.0.7

#14 0x00007f7aa427e8ef in collision_detection::CollisionEnvFCL::checkSelfCollisionHelper (this=0x7f7a6c0e2b60, req=..., res=..., state=..., acm=0x5561080b1350)
    at /root/workspace/src/moveit2/moveit_core/collision_detection_fcl/src/collision_env_fcl.cpp:269
#15 0x00007f7a9e34902a in move_group::MoveGroupStateValidationService::computeService(std::shared_ptr<rmw_request_id_s> const&, std::shared_ptr<moveit_msgs::srv::GetStateValidity_Request_<std::allocator<void> > > const&, std::shared_ptr<moveit_msgs::srv::GetStateValidity_Response_<std::allocator<void> > > const&) ()
   from /opt/ros/humble/lib/libmoveit_move_group_default_capabilities.so
#16 0x00007f7a9e3455bf in ?? ()
   from /opt/ros/humble/lib/libmoveit_move_group_default_capabilities.so
#17 0x00007f7aa4fcc376 in ?? () from /opt/ros/humble/lib/librclcpp.so
#18 0x00007f7aa4fc9d5a in rclcpp::Executor::execute_service(std::shared_ptr<rclcpp::ServiceBase>) () from /opt/ros/humble/lib/librclcpp.so
#19 0x00007f7aa4fca0c6 in rclcpp::Executor::execute_any_executable(rclcpp::AnyExecutable&) () from /opt/ros/humble/lib/librclcpp.so
#20 0x00007f7aa4fd134a in rclcpp::executors::MultiThreadedExecutor::run(unsigned--Type <RET> for more, q to quit, c to continue without paging--
 long) () from /opt/ros/humble/lib/librclcpp.so
#21 0x00007f7aa4d18253 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#22 0x00007f7aa4a87ac3 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#23 0x00007f7aa4b18a04 in clone () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) 

Probably not related to the current problem, but line 199 of collision_common.cpp is suspicious:

      const std::pair<std::string, std::string>& pc = cd1->getID() < cd2->getID() ?
                                                          std::make_pair(cd1->getID(), cd2->getID()) :
                                                          std::make_pair(cd2->getID(), cd1->getID());

Why is this storing the output of std::make_pair by reference?

@rr-mark, indeed collision_common.cpp:199 is wrong.
However, probably it's unrelated to the observed segfault, which occurs earlier, in collision_common.cpp:167.

The crash is also unrelated to execution: The log clearly shows that execution was successfully finished.
Only later, in MoveGroupStateValidationService, the crash occurs. Do you explicitly call this service?

Yes, we call /move_group/check_state_validity for checking pre-planned trajectories are still valid.

And yes, you're right, it is during that service call that it crashes.

I'm adding some debug prints now to try and work out what's going wrong.

It looks like cdata->res_->contacts is being used uninitialised. Printing cdata->res_->contacts.size() gives 97998116997072
(I'm assuming this is just random memory), and trying to iterate across cdata->res_->contacts throws

terminate called after throwing an instance of 'std::logic_error'
  what():  basic_string::_M_construct null not valid

Both cdata and cdata->res_ look like valid pointers.

(Fixing line 199 did not fix the crash)

Objects are automatically initialized by calling the default constructor. I think cdata->res_->contacts is corrupted (or released) after creation. You need to figure out why/when.

I'm confused.

  • The CollisionResult is default initialised in MoveGroupStateValidationService::computeService.
  • In PlanningScene::checkCollision, res.contacts.size() is already junk.

I'm just rejigging my dependency stack so I can get debugging and prints from moveit_ros.

Hmm, adding moveit_ros.move_group to our list of source dependencies seems to have fixed the problem. Maybe this is a problem with our mix of source and binary dependencies, rather than a problem with moveit.

Probably. The collision checking code is rather old and segfaults weren't reported for quite some time...

As @rr-mark mentioned, building the additional packages seems to have fixed the issue. I think the cause was that our binaries are from 2.5.5, whereas our build-from-source packages are based on the moveit2 humble branch, which contains backports, so we were likely linking between incompatible shared objects.

Closing this issue.