ros2/rmw_cyclonedds

Lifespan not working with transiet_local subscriber

charlielito opened this issue · 7 comments

Bug report

When subscribing to a transient_local published topic and using the Lifespan parameter, it seems that it doesn't have any effect on the subscriber's side. That means the callback of the subscriber is always called with the message ignoring the Lifespan parameter you use.

Required Info:

  • Operating System: Ubuntu 22.04
  • Installation type: binaries
  • Version or commit hash: iron
  • DDS implementation: default
  • Client library (if applicable): rclpy

Steps to reproduce issue

To reproduce the issue we have 2 nodes: one that publishes to /some_topic with transient_local and infinite lifespan QoS and the other node that subscribes to that topic also with transient_local QoS but with a lifespan of 2 seconds.

Run this publisher node in one Terminal

import rclpy
from rclpy.node import Node
from rclpy.qos import (
    QoSDurabilityPolicy,
    QoSProfile,
    QoSReliabilityPolicy,
)

from std_msgs.msg import Bool


class PubNode(Node):
    def __init__(self):
        super().__init__("pub_node")

        qos = QoSProfile(
            depth=5,
            durability=QoSDurabilityPolicy.TRANSIENT_LOCAL,
            reliability=QoSReliabilityPolicy.RELIABLE,
        )

        self.pub = self.create_publisher(Bool, "/some_topic", qos)
        self.pub.publish(Bool(data=False))


def main(args=None):
    rclpy.init(args=args)
    node = PubNode()

    try:
        rclpy.spin(node)
    except KeyboardInterrupt:
        pass

    node.destroy_node()
    rclpy.shutdown()


if __name__ == "__main__":
    main()

Then run the subscriber node in another terminal

from rclpy.node import Node
from rclpy.qos import (
    QoSDurabilityPolicy,
    QoSProfile,
    QoSReliabilityPolicy,
)
from std_msgs.msg import Bool
from rclpy.duration import Duration
import rclpy


class SubNode(Node):
    def __init__(self):
        super().__init__("sub_node")

        qos = QoSProfile(
            depth=5,
            durability=QoSDurabilityPolicy.TRANSIENT_LOCAL,
            reliability=QoSReliabilityPolicy.RELIABLE,
            lifespan=Duration(seconds=2),
        )
        print(qos)
        self.sub = self.create_subscription(
            Bool,
            "/some_topic",
            self.my_callback,
            qos,
        )

    def my_callback(self, msg: Bool):
        self.get_logger().info("Received: {}".format(msg.data))


def main(args=None):
    rclpy.init(args=args)
    node = SubNode()

    try:
        rclpy.spin(node)
    except KeyboardInterrupt:
        pass

    node.destroy_node()
    rclpy.shutdown()


if __name__ == "__main__":
    main()

Expected behavior

Because of the lifespan of 2 seconds, the callback shouldn't be triggered if the node is launched after 2 seconds of the publisher node

Actual behavior

The callback is always triggered

Additional information

In the subscriber node, I printed the QoS object to check the properties. I got this:

QoSProfile(history=HistoryPolicy.KEEP_LAST, depth=5, reliability=ReliabilityPolicy.RELIABLE, durability=DurabilityPolicy.TRANSIENT_LOCAL, lifespan=2000000000 nanoseconds, deadline=0 nanoseconds, liveliness=LivelinessPolicy.SYSTEM_DEFAULT, liveliness_lease_duration=0 nanoseconds, avoid_ros_namespace_conventions=False)

But when checking the info with ros2 topic info /some_topic --verbose I got:

Node name: sub_node
Node namespace: /
Topic type: std_msgs/msg/Bool
Topic type hash: RIHS01_feb91e995ff9ebd09c0cb3d2aed18b11077585839fb5db80193b62d74528f6c9
Endpoint type: SUBSCRIPTION
GID: 01.10.a5.80.f1.17.ec.a8.06.91.a8.ae.00.00.14.04
QoS profile:
  Reliability: RELIABLE
  History (Depth): KEEP_LAST (5)
  Durability: TRANSIENT_LOCAL
  Lifespan: Infinite
  Deadline: Infinite
  Liveliness: AUTOMATIC
  Liveliness lease duration: Infinite

which says the Lifespan is Infinite. Somehow the QoS is not being set correctly (?)

@charlielito

DDS implementation: default

That is supposed to mean you use rmw_fastrtps_cpp with iron, can you confirm?

i think, some rmw impelementation cannot show the actual QoS setting via ros2 topic info -v, this depends on rmw implementation. for example, rmw_fastrtps_cpp cannot show the history and depth QoS configuration.

there's been discussion on this, see more details for ros2/rclpy#849 (comment)

this is the conversion function RTPS QoS to RMW QoS, describes above.

https://github.com/ros2/rmw_fastrtps/blob/8eeb955f581c9cd24761185557ba7f46ef6e1d4c/rmw_fastrtps_shared_cpp/include/rmw_fastrtps_shared_cpp/qos.hpp#L128-L130

besides, here is what i see.

root@tomoyafujita:~/ros2_ws/colcon_ws# ros2 run prover_rclpy rclpy_1163_pub
...

root@tomoyafujita:~/ros2_ws/colcon_ws# ros2 run prover_rclpy rclpy_1163_sub
...

root@tomoyafujita:~/ros2_ws/colcon_ws# ros2 topic info -v /some_topic
Type: std_msgs/msg/Bool

Publisher count: 1

Node name: pub_node
Node namespace: /
Topic type: std_msgs/msg/Bool
Topic type hash: RIHS01_feb91e995ff9ebd09c0cb3d2aed18b11077585839fb5db80193b62d74528f6c9
Endpoint type: PUBLISHER
GID: 01.0f.60.db.44.f7.83.bd.00.00.00.00.00.00.13.03
QoS profile:
  Reliability: RELIABLE
  History (Depth): UNKNOWN
  Durability: TRANSIENT_LOCAL
  Lifespan: Infinite
  Deadline: Infinite
  Liveliness: AUTOMATIC
  Liveliness lease duration: Infinite

Subscription count: 1

Node name: sub_node
Node namespace: /
Topic type: std_msgs/msg/Bool
Topic type hash: RIHS01_feb91e995ff9ebd09c0cb3d2aed18b11077585839fb5db80193b62d74528f6c9
Endpoint type: SUBSCRIPTION
GID: 01.0f.60.db.18.f8.bd.4f.00.00.00.00.00.00.13.04
QoS profile:
  Reliability: RELIABLE
  History (Depth): UNKNOWN
  Durability: TRANSIENT_LOCAL
  Lifespan: 2000000000 nanoseconds
  Deadline: Infinite
  Liveliness: AUTOMATIC
  Liveliness lease duration: Infinite
  • History (Depth): UNKNOWN is set to UNKOWN as mentioned above.
  • Lifespan: 2000000000 nanoseconds is set to subscription.
  • and if subscription issues after 2 seconds, i do not see the latched message delivered to the subscription.

thanks,

@fujitatomoya my bad, actually, I was using Cyclone DDS export RMW_IMPLEMENTATION=rmw_cyclonedds_cpp
After changing it to rmw_fastrtps_cpp I can confirm your results and the expected behaviour. In that case what should we do with this issue? Who can tackle this in cyclone DDS implementation?

I'm going to move this over to the rmw_cyclonedds_cpp repository for now.

This is very closely related to eclipse-cyclonedds/cyclonedds#1835

This issue has an interesting difference with that one, though: the lifespan in DDS is spec'd to use the time of publication of the data (the "source timestamp") and so is probably never going to work in a useful way with transient-local data because the historical data would probably have expired long before it reached the reader. Therefore, simply adding a notion of lifespan to reader (as I noted as a possible way of addressing eclipse-cyclonedds/cyclonedds#1835) is not going to improve things here ...

So how badly needed is it? If it is not really important, then a documentation update is a reasonable approach, else we need to figure out how to deal with it without breaking the spec entirely.

IMO this should be fixed since the behavior should be the same independently of the DDS implementation.

IMO this should be fixed since the behavior should be the same independently of the DDS implementation.

Are you sure that's what you mean? Because that's equivalent to saying the combination is not supported because it is impossible without using vendor-specific extensions to the DDS specification.

I mean that it is very confusing that it behaves differently without any warning because the user would expect the same behavior. I think the feature is also a reasonable one since lifespan is an important part of the QoS specs