ros2/rmw_cyclonedds

Failure to run certain rclpy code with rmw_cyclonedds_cpp

clalancette opened this issue · 4 comments

Bug report

This is split out of ros2/examples#382 (and was ultimately found by osrf/ros2_test_cases#1259 as part of the Jazzy tutorial testing party).

Consider the following code:

import sys

from example_interfaces.srv import AddTwoInts

import rclpy
from rclpy.callback_groups import ReentrantCallbackGroup
from rclpy.executors import ExternalShutdownException

def main(args=None):
    rclpy.init(args=args)

    try:
        node = rclpy.create_node('minimal_client')

        cb_group = ReentrantCallbackGroup()
        cli = node.create_client(AddTwoInts, 'add_two_ints', callback_group=cb_group)
        did_run = False
        did_get_result = False
        async def call_service():
            nonlocal cli, node, did_run, did_get_result
            did_run = True
            try:
                req = AddTwoInts.Request()
                req.a = 41
                req.b = 1
                future = cli.call_async(req)
                result = await future
                node.get_logger().info(
                    'Result of add_two_ints: for %d + %d = %d' %
                    (req.a, req.b, result.sum))
            finally:
                did_get_result = True
        while not cli.wait_for_service(timeout_sec=1.0):
            node.get_logger().info('service not available, waiting again...')
        timer = node.create_timer(0.5, call_service, callback_group=cb_group)

        while rclpy.ok() and not did_run:
            rclpy.spin_once(node)

        if did_run:
            # call timer callback only once
            timer.cancel()

        while rclpy.ok() and not did_get_result:
            rclpy.spin_once(node)
    except KeyboardInterrupt:
        pass
    except ExternalShutdownException:
        sys.exit(1)

if __name__ == '__main__':
    main()

If you run in one terminal:

ros2 run examples_rclpy_minimal_service service

And the code above in a second terminal with RMW_IMPLEMENTATION=rmw_cyclonedds_cpp, it hangs forever and never completes.

Required Info:

  • Operating System:
    • Ubuntu 24.04, amd64
  • Installation type:
    • From source
  • Version or commit hash:
  • DDS implementation:
    • CycloneDDS
  • Client library (if applicable):
    • rclpy

Steps to reproduce issue

See the initial description.

Expected behavior

Example code succeeds and prints a response from the service.

Actual behavior

Example code hangs forever.

Additional information

During debugging of this, I found that part of the problem seems to be in the rclpy.spin_once call. In particular, what happens is that for every call there the node is added to the executor, which constitutes an "event". At that point, it seems like because this is spin_once, that event, and that event alone, is continually returned. We never make it to handling the other events, including the client response event. This is somewhat inherent in the contract of spin_once, but it is odd that this works for rmw_fastrtps_cpp while it fails for rmw_cyclonedds_cpp

@mjcarroll pointed out that the ordering in the executors is different in rclpy than rclcpp.

I don't know how the RMW layer interacts with Python async code, but this looks to me like a single threaded process with one node, and one outstanding request. So even if you only intermittently "spin" that node, you would expect a result, right?

In other words, I have some difficulty seeing how this

During debugging of this, I found that part of the problem seems to be in the rclpy.spin_once call. In particular, what happens is that for every call there the node is added to the executor, which constitutes an "event". At that point, it seems like because this is spin_once, that event, and that event alone, is continually returned. We never make it to handling the other events, including the client response event.

relates to the code. Maybe someone can help clear up that confusion for me?

Given that this is backlog, it is probably not the highest priority.

Not an answer to this question, but I also had problems with CycloneDDS on Jazzy.

In my case, the problems I had were related to buffer overflow:

 ros2 run demo_nodes_cpp talker
*** buffer overflow detected ***: terminated

WHen listing topics:

ros2 topic list 
*** buffer overflow detected ***: terminated
Aborted (core dumped)

My solution was to compile CycloneDDS from source. Just for reference, the steps I used to compile it from source are described at:

eclipse-cyclonedds/cyclonedds#2043

This issue has been mentioned on ROS Discourse. There might be relevant details there:

https://discourse.ros.org/t/preparing-for-rolling-sync-2024-07-11/38526/2