ros-controls/ros2_controllers

ros2_control_node crashes abruptly with error "what(): Taking data from action server but nothing is ready

watanabeyyy opened this issue · 16 comments

I am building an application with ros2 iron using moveit and ros2control.

The application I created usually works without errors, but "sometimes" it crashes abruptly.

The errors at that time are as follows.

I would appreciate any insight you can give me.

[move_group-2] [INFO] [1705464328.148369918] [moveit.simple_controller_manager.follow_joint_trajectory_controller_handle]: Controller '/A_joint_trajectory_controller' successfully finished 
[move_group-2] [INFO] [1705464328.148765774] [moveit.simple_controller_manager.follow_joint_trajectory_controller_handle]: Controller '/B_joint_trajectory_controller' successfully finished 
[ros2_control_node-3] terminate called after throwing an instance of 'std::runtime_error' 
[ros2_control_node-3] what(): Taking data from action server but nothing is ready 
[ros2_control_node-3] Stack trace (most recent call last): 
[ros2_control_node-3] #15 Object "", at 0xffffffffffffffff, in 
[ros2_control_node-3] #14 Object "/opt/ros/iron/lib/controller_manager/ros2_control_node", at 0x55d77ba19e04, in _start 
[ros2_control_node-3] #13 Source "../csu/libc-start.c", line 392, in __libc_start_main_impl [0x7f9969029e3f] 
[ros2_control_node-3] #12 Source "../sysdeps/nptl/libc_start_call_main.h", line 58, in__libc_start_call_main [0x7f9969029d8f] 
[ros2_control_node-3] #11 Object "/opt/ros/iron/lib/controller_manager/ros2_control_node", at 0x55d77ba199ef, in main 
[ros2_control_node-3] #10 Object "/opt/ros/iron/lib/librclcpp.so", at 0x7f996990ee14, in rclcpp::executors::MultiThreadedExecutor::spin() 
[ros2_control_node-3] #9 Object "/opt/ros/iron/lib/librclcpp.so", at 0x7f996990e9b1, in rclcpp::executors::MultiThreadedExecutor::run(unsigned long) 
[ros2_control_node-3] #8 Object "/opt/ros/iron/lib/librclcpp.so", at 0x7f99699059fe, in rclcpp::Executor::get_next_ready_executable_from_map(rclcpp::AnyExecutable&, std::map<std::weak_ptrrclcpp::CallbackGroup, std::weak_ptrrclcpp::node_interfaces::NodeBaseInterface, std::owner_less<std::weak_ptrrclcpp::CallbackGroup >, std::allocator<std::pair<std::weak_ptrrclcpp::CallbackGroup const, std::weak_ptrrclcpp::node_interfaces::NodeBaseInterface > > > const&) 
[ros2_control_node-3] #7 Object "/opt/ros/iron/lib/librclcpp_action.so", at 0x7f99345c4785, in 
[ros2_control_node-3] #6 Object "/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.30", at 0x7f99694ae4d7, in __cxa_throw 
[ros2_control_node-3] #5 Object "/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.30", at 0x7f99694ae276, in std::terminate() 
[ros2_control_node-3] #4 Object "/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.30", at 0x7f99694ae20b, in 
[ros2_control_node-3] #3 Object "/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.30", at 0x7f99694a2b9d, in 
[ros2_control_node-3] #2 Source "./stdlib/abort.c", line 79, in abort [0x7f99690287f2] 
[ros2_control_node-3] #1 Source "../sysdeps/posix/raise.c", line 26, in raise [0x7f9969042475] 
[ros2_control_node-3] #0 | Source "./nptl/pthread_kill.c", line 89, in__pthread_kill_internal 
[ros2_control_node-3] | Source "./nptl/pthread_kill.c", line 78, in __pthread_kill_implementation 
[ros2_control_node-3] Source "./nptl/pthread_kill.c", line 44, in__pthread_kill [0x7f99690969fc] 
[ros2_control_node-3] Aborted (Signal sent by tkill() 180811 1000) 
[ERROR] [ros2_control_node-3]: process has died [pid 180811, exit code -6, cmd '/opt/ros/iron/lib/controller_manager/ros2_control_node --ros-args --params-file /tmp/launch_params_44ibrmz5 --params-file /home/user/ros2_ws/***/ros2_control_config.yaml'].

@saikishor @destogl @bmagyar I yet don't know why, but if the action server of JTC throws this std::runtime_error, can this kill the controller manager and then ros2_control_node? How is this handled by pluginlib?

@christophfroehlich Yes, this will happen. I remember us discussing this topic to make it fail-proof. I had a similar experience but with a different controller a few months back. This is the to-do list for the Jazzy and also to backport.

@watanabeyyy I am having the same problem as you...

Have you found any work arounds yet?

Thanks!

[ros2_control_node-1] terminate called after throwing an instance of 'std::runtime_error'
[ros2_control_node-1]   what():  Taking data from action server but nothing is ready
[ros2_control_node-1] Stack trace (most recent call last):
[ros2_control_node-1] #15   Object "", at 0xffffffffffffffff, in 
[ros2_control_node-1] #14   Object "/opt/ros/iron/lib/controller_manager/ros2_control_node", at 0x59eda8974e04, in _start
[ros2_control_node-1] #13   Source "../csu/libc-start.c", line 392, in __libc_start_main_impl [0x7a83afa29e3f]
[ros2_control_node-1] #12   Source "../sysdeps/nptl/libc_start_call_main.h", line 58, in __libc_start_call_main [0x7a83afa29d8f]
[ros2_control_node-1] #11   Object "/opt/ros/iron/lib/controller_manager/ros2_control_node", at 0x59eda89749ef, in main
[ros2_control_node-1] #10   Object "/opt/ros/iron/lib/librclcpp.so", at 0x7a83b030ea84, in rclcpp::executors::MultiThreadedExecutor::spin()
[ros2_control_node-1] #9    Object "/opt/ros/iron/lib/librclcpp.so", at 0x7a83b030e621, in rclcpp::executors::MultiThreadedExecutor::run(unsigned long)
[ros2_control_node-1] #8    Object "/opt/ros/iron/lib/librclcpp.so", at 0x7a83b030585e, in rclcpp::Executor::get_next_ready_executable_from_map(rclcpp::AnyExecutable&, std::map<std::weak_ptr<rclcpp::CallbackGroup>, std::weak_ptr<rclcpp::node_interfaces::NodeBaseInterface>, std::owner_less<std::weak_ptr<rclcpp::CallbackGroup> >, std::allocator<std::pair<std::weak_ptr<rclcpp::CallbackGroup> const, std::weak_ptr<rclcpp::node_interfaces::NodeBaseInterface> > > > const&)
[ros2_control_node-1] #7    Object "/opt/ros/iron/lib/librclcpp_action.so", at 0x7a83a0259785, in 
[ros2_control_node-1] #6    Object "/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.30", at 0x7a83afeae4d7, in __cxa_throw
[ros2_control_node-1] #5    Object "/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.30", at 0x7a83afeae276, in std::terminate()
[ros2_control_node-1] #4    Object "/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.30", at 0x7a83afeae20b, in 
[ros2_control_node-1] #3    Object "/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.30", at 0x7a83afea2b9d, in 
[ros2_control_node-1] #2    Source "./stdlib/abort.c", line 79, in abort [0x7a83afa287f2]
[ros2_control_node-1] #1    Source "../sysdeps/posix/raise.c", line 26, in raise [0x7a83afa42475]
[ros2_control_node-1] #0  | Source "./nptl/pthread_kill.c", line 89, in __pthread_kill_internal
[ros2_control_node-1]     | Source "./nptl/pthread_kill.c", line 78, in __pthread_kill_implementation
[ros2_control_node-1]       Source "./nptl/pthread_kill.c", line 44, in __pthread_kill [0x7a83afa969fc]
[ros2_control_node-1] Aborted (Signal sent by tkill() 15536 1000)

I'm getting this error when running the latest joint_trajectory_controller in iron at random times. Haven't narrowed down the source of the problem yet.

I've seen the Taking data from action server but nothing is ready exception already a couple of times in the CI, but this was some time ago. And we haven't changed anything regarding JTC or realtime_tools on iron lately afaik.

I've been using humble until about 3 weeks ago and I've been seeing this random error since then quite a few times. I've added a few extra log outputs to try to narrow down the problem. This occurs when I'm using rviz2/ MoveIt2 to send desired poses. Here is the log output:


[ros2_control_node-1] [INFO] [1714416894.528266646] [joint_trajectory_controller]: Starting goal_accepted_callback!
[ros2_control_node-1] [INFO] [1714416894.528396324] [joint_trajectory_controller]: Completed goal_accepted_callback!
[ros2_control_node-1] terminate called after throwing an instance of 'std::runtime_error'
[ros2_control_node-1]   what():  Taking data from action server but nothing is ready
[ros2_control_node-1] Stack trace (most recent call last) in thread 134319:
[ros2_control_node-1] #13   Object "", at 0xffffffffffffffff, in
[ros2_control_node-1] #12   Source "../sysdeps/unix/sysv/linux/x86_64/clone3.S", line 81, in __clone3 [0x71f84caa784f]
[ros2_control_node-1] #11   Source "./nptl/pthread_create.c", line 442, in start_thread [0x71f84ca15ac2]
[ros2_control_node-1] #10   Object "/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.30", at 0x71f84cca6252, in
[ros2_control_node-1] #9    Object "/opt/ros/iron/lib/librclcpp.so", at 0x71f84cf25571, in rclcpp::executors::MultiThreadedExecutor::run(unsigned long)
[ros2_control_node-1] #8    Object "/opt/ros/iron/lib/librclcpp.so", at 0x71f84cf1c7ae, in rclcpp::Executor::get_next_ready_executable_from_map(rclcpp::AnyExecutable&, std::map<std::weak_ptr<rclcpp::CallbackGroup>, std::weak_ptr<rclcpp::node_interfaces::NodeBaseInterface>, std::owner_less<std::weak_ptr<rclcpp::CallbackGroup> >, std::allocator<std::pair<std::weak_ptr<rclcpp::CallbackGroup> const, std::weak_ptr<rclcpp::node_interfaces::NodeBaseInterface> > > > const&)
[ros2_control_node-1] #7    Object "/opt/ros/iron/lib/librclcpp_action.so", at 0x71f81c047785, in
[ros2_control_node-1] #6    Object "/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.30", at 0x71f84cc784d7, in __cxa_throw
[ros2_control_node-1] #5    Object "/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.30", at 0x71f84cc78276, in std::terminate()
[ros2_control_node-1] #4    Object "/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.30", at 0x71f84cc7820b, in
[ros2_control_node-1] #3    Object "/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.30", at 0x71f84cc6cb9d, in
[ros2_control_node-1] #2    Source "./stdlib/abort.c", line 79, in abort [0x71f84c9a97f2]
[ros2_control_node-1] #1    Source "../sysdeps/posix/raise.c", line 26, in raise [0x71f84c9c3475]
[ros2_control_node-1] #0  | Source "./nptl/pthread_kill.c", line 89, in __pthread_kill_internal
[ros2_control_node-1]     | Source "./nptl/pthread_kill.c", line 78, in __pthread_kill_implementation
[ros2_control_node-1]       Source "./nptl/pthread_kill.c", line 44, in __pthread_kill [0x71f84ca179fc]
[ros2_control_node-1] Aborted (Signal sent by tkill() 134093 1001)

It fails once about every ten times that I send a desired pose. Do you have a recommended way of finding the bug? I tried building debug symbols and running in gdb / gdb server, but the output isn't very helpful. I'm thinking about just building the joint_trajectory_controller from humble and seeing I get the same error.

Hello!

It seems to be happening from the core packages side. This recently merged PR : ros2/rclcpp#2495 might fix it.

Best Regards,
Sai

Please note that the patch will likely not be backported to Iron as it introduces an ABI change.
It is still discussed here: ros2/rclcpp#2242

EDIT:
I created a PR for an Iron backport ros2/rclcpp#2530

@firesurfer What is the status of the backports of the rclcpp fix to iron and humble? What I've seen is that it got merged into humble 3 weeks ago, but I'm not sure about humble.

Iron has been backported and merged. Not sure about humble though.
In jazzy the fix should be included by now.

From the humble commit history I think it is not backported yet:
https://github.com/ros2/rclcpp/commits/humble/
Also it seems there is no open PR for that in rclcpp even though in the repository of the company that contributed the fix to rolling and iron has a humble backport branch: https://github.com/cellumation/rclcpp/tree/action_humble_backport

EDIT: Basically the same bug also exists in rclpy: ros2/rclpy#1123
From our experience this issue only occurs if multiple actions clients at once are used from the same python node and they finished at the same time (e.g. two JTCs running the same trajectory). Our solution for this issue was implement a tiny wait between calling both actions.

I'm using ros2 rolling on Ubuntu 22.04 and have built ros2_control and ros2_controllers from source (latest pull) but I'm still facing this problem.
I plan and execute a lot of motions consecutively using MoveIt2 and ros2_control non-deterministically crashes.

[ros2_control_node-1] [INFO] [1722437041.653102702] [bob_scaled_joint_trajectory_controller]: Accepted new action goal
[ros2_control_node-1] [INFO] [1722437041.716655347] [bob_scaled_joint_trajectory_controller]: Goal reached, success!
[ros2_control_node-1] [INFO] [1722437041.789179821] [bob_scaled_joint_trajectory_controller]: Received new action goal
[ros2_control_node-1] [INFO] [1722437041.789196983] [bob_scaled_joint_trajectory_controller]: Accepted new action goal
[ros2_control_node-1] [INFO] [1722437041.844597984] [bob_scaled_joint_trajectory_controller]: Goal reached, success!
[ros2_control_node-1] [INFO] [1722437041.925187200] [bob_scaled_joint_trajectory_controller]: Received new action goal
[ros2_control_node-1] [INFO] [1722437041.925203361] [bob_scaled_joint_trajectory_controller]: Accepted new action goal
[ros2_control_node-1] terminate called after throwing an instance of 'std::runtime_error'
[ros2_control_node-1]   what():  Taking data from action server but nothing is ready
[ros2_control_node-1] Stack trace (most recent call last) in thread 386981:
[ros2_control_node-1] #13   Object "", at 0xffffffffffffffff, in 
[ros2_control_node-1] #12   Source "../sysdeps/unix/sysv/linux/x86_64/clone3.S", line 81, in __clone3 [0x710b01b2684f]
[ros2_control_node-1] #11   Source "./nptl/pthread_create.c", line 442, in start_thread [0x710b01a94ac2]
[ros2_control_node-1] #10   Object "/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.30", at 0x710b01edc252, in 
[ros2_control_node-1] #9    Object "/opt/ros/rolling/lib/librclcpp.so", at 0x710b02313011, in rclcpp::executors::MultiThreadedExecutor::run(unsigned long)
[ros2_control_node-1] #8    Object "/opt/ros/rolling/lib/librclcpp.so", at 0x710b02309eee, in rclcpp::Executor::get_next_ready_executable_from_map(rclcpp::AnyExecutable&, std::map<std::weak_ptr<rclcpp::CallbackGroup>, std::weak_ptr<rclcpp::node_interfaces::NodeBaseInterface>, std::owner_less<std::weak_ptr<rclcpp::CallbackGroup> >, std::allocator<std::pair<std::weak_ptr<rclcpp::CallbackGroup> const, std::weak_ptr<rclcpp::node_interfaces::NodeBaseInterface> > > > const&)
[ros2_control_node-1] #7    Object "/opt/ros/rolling/lib/librclcpp_action.so", at 0x710ad47e2785, in 
[ros2_control_node-1] #6    Object "/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.30", at 0x710b01eae4d7, in __cxa_throw
[ros2_control_node-1] #5    Object "/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.30", at 0x710b01eae276, in std::terminate()
[ros2_control_node-1] #4    Object "/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.30", at 0x710b01eae20b, in 
[ros2_control_node-1] #3    Object "/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.30", at 0x710b01ea2b9d, in 
[ros2_control_node-1] #2    Source "./stdlib/abort.c", line 79, in abort [0x710b01a287f2]
[ros2_control_node-1] #1    Source "../sysdeps/posix/raise.c", line 26, in raise [0x710b01a42475]
[ros2_control_node-1] #0  | Source "./nptl/pthread_kill.c", line 89, in __pthread_kill_internal
[ros2_control_node-1]     | Source "./nptl/pthread_kill.c", line 78, in __pthread_kill_implementation
[ros2_control_node-1]       Source "./nptl/pthread_kill.c", line 44, in __pthread_kill [0x710b01a969fc]
[ros2_control_node-1] Aborted (Signal sent by tkill() 386871 1001)
[ERROR] [ros2_control_node-1]: process has died [pid 386871, exit code -6, cmd '/home/vihaan/Projects/crm_ws/install/controller_manager/lib/controller_manager/ros2_control_node --ros-args --params-file /home/vihaan/Projects/crm_ws/install/crm_dual_control/share/crm_dual_control/config/update_rate.yaml --params-file /tmp/launch_params_8gtkdd91'].

which version of rclcpp do you have? It was released with 28.0.1, and rolling is not updated on 22.04 any more!

I have 27.0.0. I just switch to jazzy and don't have this issue. Thank you.