ros2/rclcpp

[Bug] Jazzy / Rolling Transient-Local Publishers with IPC do not deliver messages to Inter-process subscribers

Opened this issue · 2 comments

Bug report

Required Info:

  • Operating System: Ubuntu 22.04, running Rolling docker images based on 24.04
  • Installation type: OSRF Docker
  • Version or commit hash:
  • DDS implementation: Fast-DDS
  • Client library (if applicable): rclcpp

Expected behavior

Late-joining subscriptions when Intra-Process Communications (IPC) is enabled in the NodeOptions are delivered, regardless of the location.

Actual behavior

When the subscriptions are outside of the process containing the IPC publisher of transient-local topics, the message is never delivered after the initial publication.

When the subscription is with in the process containing the IPC publisher of transient-local topics, the message is properly delivered. I believe that shows that IPC transient-local PR is missing an important case (#2303) of when the subscription is in another process and needs to be put over the network.

For example, I have a map publisher in the Nav2 map_server:

  occ_pub_ = create_publisher<nav_msgs::msg::OccupancyGrid>(
    topic_name,
    rclcpp::QoS(rclcpp::KeepLast(1)).transient_local().reliable());

That is composed into the same process as the rest of Nav2, localization, etc. If I inject a subscription to that information in some node that is running periodically, I see the log that a new map is received reliably. So, late-joining subscriptions within the IPC process are working (when IPC is enabled for that node as well).

  rclcpp::QoS map_qos(10);  // initialize to default
  map_qos.transient_local();
  map_qos.reliable();
  map_qos.keep_last(1);
  auto node = node_.lock();
  auto map_sub = node->create_subscription<nav_msgs::msg::OccupancyGrid>(
    map_topic_, map_qos, [this](nav_msgs::msg::OccupancyGrid::SharedPtr ) {
                // Lambda function to handle the message
                RCLCPP_INFO(logger_, "Received new map");
            });
  rclcpp::Rate r(1);
  r.sleep();

However, when I move the map_server into a new component container, this stops working immediately. Further, ROS 2 CLI and Rviz2 are unable to obtain the topic as well. The only exception to this is when the CLI, Rviz, or external process node is running before the transient-local publisher publishes a message, thereby getting it at publication time. After that point however, it is unobtainable.

Steps to reproduce issue

Create a transient local publisher / subscriber demo in a container with IPC enabled; it works. Move one into another container in another process, it fails to work.

Additional information

See nav2 ticket we're working on the IPC migration ros-navigation/navigation2#4691 and the rclcpp PR implementing transient-local IPC #2303

Hi @SteveMacenski,

Thanks for sharing your findings. I think I have found the problem and put up my fix in PR #2708 for review.
Please feel free to give it a try and let me know if there are any problems.

Thanks!

Thanks for the ultra-fast fix!