Failure to run certain rclpy code with rmw_cyclonedds_cpp
clalancette opened this issue · 4 comments
Bug report
This is split out of ros2/examples#382 (and was ultimately found by osrf/ros2_test_cases#1259 as part of the Jazzy tutorial testing party).
Consider the following code:
import sys
from example_interfaces.srv import AddTwoInts
import rclpy
from rclpy.callback_groups import ReentrantCallbackGroup
from rclpy.executors import ExternalShutdownException
def main(args=None):
rclpy.init(args=args)
try:
node = rclpy.create_node('minimal_client')
cb_group = ReentrantCallbackGroup()
cli = node.create_client(AddTwoInts, 'add_two_ints', callback_group=cb_group)
did_run = False
did_get_result = False
async def call_service():
nonlocal cli, node, did_run, did_get_result
did_run = True
try:
req = AddTwoInts.Request()
req.a = 41
req.b = 1
future = cli.call_async(req)
result = await future
node.get_logger().info(
'Result of add_two_ints: for %d + %d = %d' %
(req.a, req.b, result.sum))
finally:
did_get_result = True
while not cli.wait_for_service(timeout_sec=1.0):
node.get_logger().info('service not available, waiting again...')
timer = node.create_timer(0.5, call_service, callback_group=cb_group)
while rclpy.ok() and not did_run:
rclpy.spin_once(node)
if did_run:
# call timer callback only once
timer.cancel()
while rclpy.ok() and not did_get_result:
rclpy.spin_once(node)
except KeyboardInterrupt:
pass
except ExternalShutdownException:
sys.exit(1)
if __name__ == '__main__':
main()
If you run in one terminal:
ros2 run examples_rclpy_minimal_service service
And the code above in a second terminal with RMW_IMPLEMENTATION=rmw_cyclonedds_cpp
, it hangs forever and never completes.
Required Info:
- Operating System:
- Ubuntu 24.04, amd64
- Installation type:
- From source
- Version or commit hash:
- DDS implementation:
- CycloneDDS
- Client library (if applicable):
- rclpy
Steps to reproduce issue
See the initial description.
Expected behavior
Example code succeeds and prints a response from the service.
Actual behavior
Example code hangs forever.
Additional information
During debugging of this, I found that part of the problem seems to be in the rclpy.spin_once
call. In particular, what happens is that for every call there the node is added to the executor, which constitutes an "event". At that point, it seems like because this is spin_once
, that event, and that event alone, is continually returned. We never make it to handling the other events, including the client response event. This is somewhat inherent in the contract of spin_once
, but it is odd that this works for rmw_fastrtps_cpp
while it fails for rmw_cyclonedds_cpp
@mjcarroll pointed out that the ordering in the executors is different in rclpy than rclcpp.
I don't know how the RMW layer interacts with Python async
code, but this looks to me like a single threaded process with one node, and one outstanding request. So even if you only intermittently "spin" that node, you would expect a result, right?
In other words, I have some difficulty seeing how this
During debugging of this, I found that part of the problem seems to be in the rclpy.spin_once call. In particular, what happens is that for every call there the node is added to the executor, which constitutes an "event". At that point, it seems like because this is spin_once, that event, and that event alone, is continually returned. We never make it to handling the other events, including the client response event.
relates to the code. Maybe someone can help clear up that confusion for me?
Given that this is backlog, it is probably not the highest priority.
Not an answer to this question, but I also had problems with CycloneDDS on Jazzy.
In my case, the problems I had were related to buffer overflow:
ros2 run demo_nodes_cpp talker
*** buffer overflow detected ***: terminated
WHen listing topics:
ros2 topic list
*** buffer overflow detected ***: terminated
Aborted (core dumped)
My solution was to compile CycloneDDS from source. Just for reference, the steps I used to compile it from source are described at:
This issue has been mentioned on ROS Discourse. There might be relevant details there:
https://discourse.ros.org/t/preparing-for-rolling-sync-2024-07-11/38526/2