Connext is very slow to shutdown

Bug report

Required Info:

Operating System:
- Ubuntu 16.04 AMD64
Installation type:
- Source
Version or commit hash:
- Release versions of crystal, from ros2/ros2#624
DDS implementation:
- RTI Connext
Client library (if applicable):
- rclpy-ish

Steps to reproduce issue

RMW_IMPLEMENTATION=rmw_connext_cpp ros2 run demo_nodes_cpp talker
(hit Ctrl-C here)

Expected behavior

Talker starts up, publishes some data, then quickly goes away when the user hits Ctrl-C.

Actual behavior

Talker starts up, publishes some data, then takes at least 3 seconds (sometimes longer) to go away after the user hits Ctrl-C.

Additional information

This seems to get worse with the number of nodes in the process. For the composition demos, for instance, it almost seems like it takes 3-5 seconds for each node loaded into the process.

We have seen this also when testing with AXCIOMA, deleting a domain participant can take some time with RTI Connext DDS

From @brawner:

The rclcpp unit tests take significantly longer for rmw_connext_cpp than other middleware implementations.

Compare ConnextDDS unit tests with rclcpp (11 minutes)
http://build.ros2.org/view/Rci/job/Rci__nightly-connext_ubuntu_focal_amd64/32/testReport/rclcpp/

vs FastRTPS (1 min 30 s):
http://build.ros2.org/view/Rci/job/Rci__nightly-fastrtps_ubuntu_focal_amd64/34/testReport/rclcpp/

I've narrowed this down to the destruction of the rclcpp::Node, a call to destruct a waitable takes the vast majority of this time. I believe this is happening in AllocatorMemoryStrategy, but since it uses shared_ptrs, the actual destruct may be elsewhere.
If I put steady_clock measurements around this line, or even if I iterate through the vector and call reset explicitly, there is one waitable that will consume significant amounts of time to delete. This isn't for all waitables, just one default one that I can't figure out where it's coming from.

@brawner I've assigned you to this for any follow-up work to determine exactly where the destruction slowness is coming from. If it turns out to be in Connext code, and there is nothing we can do about it, then I'll suggest opening a Known Issue on https://index.ros.org/doc/ros2/Releases/Release-Rolling-Ridley/ (and maybe the other distributions as well).

It appears deleting the participant is the main hangup here. This line takes all 3.18 seconds of node destruction.

rmw_connext/rmw_connext_shared_cpp/src/node.cpp

Line 486 in e83bb12

DDS::ReturnCode_t local_ret = dpf_->delete_participant(participant);

A couple of forum posts, but I'm not sure they help me understand the issue any better.
https://community.rti.com/forum-topic/ddstheparticipantfactory-deleteparticipant-hangs
https://community.rti.com/forum-topic/deleteparticipant-never-returns

A couple of forum posts, but I'm not sure they help me understand the issue any better.
https://community.rti.com/forum-topic/ddstheparticipantfactory-deleteparticipant-hangs
https://community.rti.com/forum-topic/deleteparticipant-never-returns

There's one thing in there that is potentially helpful; it seems like the delays can come because delete_participant is waiting for threads to complete. If we can figure out how to make those threads complete faster, then it could be speeded up.

However, since we don't have the source to Connext, figuring that out may be tricky.

Hi Chris - I'm not certain if this has already been explored, but there is a QoS setting for shutdown period:
https://community.rti.com/forum-topic/3-seconds-shutdown-domainparticipant
has the Q&A and links to the QoS documentation.

Hi Chris - I'm not certain if this has already been explored, but there is a QoS setting for shutdown period:
https://community.rti.com/forum-topic/3-seconds-shutdown-domainparticipant
has the Q&A and links to the QoS documentation.

I didn't know about that one. Thanks for the heads-up, I'll give it a try and see if it makes a difference to this problem.