[Bug] Jazzy / Rolling Transient-Local Publishers with IPC do not deliver messages to Inter-process subscribers
Opened this issue · 2 comments
Bug report
Required Info:
- Operating System: Ubuntu 22.04, running Rolling docker images based on 24.04
- Installation type: OSRF Docker
- Version or commit hash:
- DDS implementation: Fast-DDS
- Client library (if applicable): rclcpp
Expected behavior
Late-joining subscriptions when Intra-Process Communications (IPC) is enabled in the NodeOptions
are delivered, regardless of the location.
Actual behavior
When the subscriptions are outside of the process containing the IPC publisher of transient-local topics, the message is never delivered after the initial publication.
When the subscription is with in the process containing the IPC publisher of transient-local topics, the message is properly delivered. I believe that shows that IPC transient-local PR is missing an important case (#2303) of when the subscription is in another process and needs to be put over the network.
For example, I have a map publisher in the Nav2 map_server
:
occ_pub_ = create_publisher<nav_msgs::msg::OccupancyGrid>(
topic_name,
rclcpp::QoS(rclcpp::KeepLast(1)).transient_local().reliable());
That is composed into the same process as the rest of Nav2, localization, etc. If I inject a subscription to that information in some node that is running periodically, I see the log that a new map is received reliably. So, late-joining subscriptions within the IPC process are working (when IPC is enabled for that node as well).
rclcpp::QoS map_qos(10); // initialize to default
map_qos.transient_local();
map_qos.reliable();
map_qos.keep_last(1);
auto node = node_.lock();
auto map_sub = node->create_subscription<nav_msgs::msg::OccupancyGrid>(
map_topic_, map_qos, [this](nav_msgs::msg::OccupancyGrid::SharedPtr ) {
// Lambda function to handle the message
RCLCPP_INFO(logger_, "Received new map");
});
rclcpp::Rate r(1);
r.sleep();
However, when I move the map_server
into a new component container, this stops working immediately. Further, ROS 2 CLI and Rviz2 are unable to obtain the topic as well. The only exception to this is when the CLI, Rviz, or external process node is running before the transient-local publisher publishes a message, thereby getting it at publication time. After that point however, it is unobtainable.
Steps to reproduce issue
Create a transient local publisher / subscriber demo in a container with IPC enabled; it works. Move one into another container in another process, it fails to work.
Additional information
See nav2 ticket we're working on the IPC migration ros-navigation/navigation2#4691 and the rclcpp PR implementing transient-local IPC #2303
Hi @SteveMacenski,
Thanks for sharing your findings. I think I have found the problem and put up my fix in PR #2708 for review.
Please feel free to give it a try and let me know if there are any problems.
Thanks!
Thanks for the ultra-fast fix!