ros2/rclcpp

Shutdown transition on base lifecycle node dtor may lead to segaults on subclass-registered shutdown callback

gabrielfpacheco opened this issue · 6 comments

Bug report

Required Info:

  • Operating System:
    • Ubuntu 22.04
  • Installation type:
    • ROS 2 Humble
  • Version or commit hash:
    • 16.0.9-1jammy.20240517.180218
  • DDS implementation:
    • N/A
  • Client library (if applicable):
    • rclcpp

Steps to reproduce issue

Execute the following test suite:

#include <gtest/gtest.h>
#include <memory>
#include "lifecycle_msgs/msg/state.hpp"
#include "rclcpp/rclcpp.hpp"
#include "rclcpp_lifecycle/lifecycle_node.hpp"

using std::placeholders::_1;
using lifecycle_msgs::msg::Transition;
using lifecycle_msgs::msg::State;
using CallbackReturn =
  rclcpp_lifecycle::node_interfaces::LifecycleNodeInterface::CallbackReturn;

class TestDefaultStateMachine : public ::testing::Test
{
protected:
  static void SetUpTestCase()
  {
    rclcpp::init(0, nullptr);
  }

  static void TearDownTestCase()
  {
    rclcpp::shutdown();
  }
};

class OnShutdownLifecycleNode : public rclcpp_lifecycle::LifecycleNode
{
public:
  explicit OnShutdownLifecycleNode(const std::string & node_name)
  : rclcpp_lifecycle::LifecycleNode(node_name),
    on_shutdown_counter_ptr_(new int(0))
  {
    register_on_shutdown(std::bind(&OnShutdownLifecycleNode::my_shutdown, this, _1));
  }

  ~OnShutdownLifecycleNode() {
    delete on_shutdown_counter_ptr_;
    on_shutdown_counter_ptr_ = nullptr;
  }

private:
  CallbackReturn my_shutdown(const rclcpp_lifecycle::State &)
  {
    RCLCPP_INFO(get_logger(), "OnShutdownLifecycleNode::my_shutdown called!");
    *on_shutdown_counter_ptr_ += 1;
    RCLCPP_INFO(
      get_logger(), "OnShutdownLifecycleNode::my_shutdown | counter : %d",
      *on_shutdown_counter_ptr_);

    return CallbackReturn::SUCCESS;
  }

  int * on_shutdown_counter_ptr_;
};

TEST_F(TestDefaultStateMachine, base_lifecycle_dtor_not_called)
{
  auto test_node = std::make_shared<OnShutdownLifecycleNode>("testnode");
  test_node->shutdown();

  // The base rclcpp::LifecycleNode destructor won't trigger the `shutdown` transition
  // because the node should be already 'finalized'.
  test_node.reset();
}

TEST_F(TestDefaultStateMachine, base_lifecycle_dtor_call_leads_to_segfault)
{
  auto test_node = std::make_shared<OnShutdownLifecycleNode>("testnode");

  // A segfault will occur because the subclass is destroyed before the base class
  // triggers the `shutdown` transition, which calls `my_shutdown` and tries to access
  // deallocated memory.
  test_node.reset();
}

Expected behavior

No segfaults on node destruction :)

Actual behavior

[==========] Running 2 tests from 1 test suite.
[----------] Global test environment set-up.
[----------] 2 tests from TestDefaultStateMachine
[ RUN      ] TestDefaultStateMachine.base_lifecycle_dtor_not_called
[INFO] [1717768442.004851820] [testnode]: OnShutdownLifecycleNode::my_shutdown called!
[INFO] [1717768442.004908374] [testnode]: OnShutdownLifecycleNode::my_shutdown | counter : 1
[       OK ] TestDefaultStateMachine.base_lifecycle_dtor_not_called (11 ms)
[ RUN      ] TestDefaultStateMachine.base_lifecycle_dtor_call_leads_to_segfault
[INFO] [1717768442.011931951] [testnode]: OnShutdownLifecycleNode::my_shutdown called!
Segmentation fault (core dumped)

Additional information

The issue started to happen after May 23rd Humble release. In this release, it has been introduced some logic to trigger the shutdown transition from rclcpp::Lifecycle destructor. This modification was later on reverted, but the revert was not backported to Humble's 16.0.9 and Iron's 21.06 although present in Jazzy's 28.1.2 release.

Triggering the shutdown transition from the base class destructor will eventually call the a user-registered on_shutdown callback at the derived class. As the subclass destructor is called before the base class', this leads to undesired behavior of calling functions of an already-destructed object, which may lead to segmentation faults.

Feature request

If the decision to trigger a shutdown transition from the node destructor is kept, shouldn't there be an API at rclcpp::Lifecycle to unregister a shutdown callback the user has initially registered?

@gabrielfpacheco thanks for creating issue. at a glance, i think we should remove all callbacks registered in the destructor since shutdown in the destructor is not user call. but there may be something more that i miss, i will try to allocate time to address this.

@fujitatomoya Could you please elaborate on what drove the decision for the call to shutdown from the base class dtor? Since the callback from the base interface doesn't do anything and the base class can't call callbacks from subclasses, what were you guys trying to achieve with this change?

@fujitatomoya Could you please elaborate on what drove the decision for the call to shutdown from the base class dtor? Since the callback from the base interface doesn't do anything and the base class can't call callbacks from subclasses, what were you guys trying to achieve with this change?

Subscribers to the node lifecycle state are notified of the shutdown event. Helpful for applications managing node state or for reporting to the user. That's about it, since as you point out, the subclass on_shutdown cannot be called from the base dtor.

Thanks for the info. Unfortunately, I think it creates more problems than it solves though.

Thanks for the info. Unfortunately, I think it creates more problems than it solves though.

Yes, it's error prone and hard to make assumptions when the register_on_shutdown method in the lifecycle class is public. Can't say I understand that design choice.

see #2520 (comment), all related to PRs are rolled back and merged.

i will go ahead to close this issue, feel free to reopen if i miss anything.