osrf/buildfarm-tools

`ci_agent` in `build.ros2.org` has unexpected running nodes in background and causes test regressions

Opened this issue · 1 comments

Migrated from https://github.com/osrf/buildfarmer/issues/337 on Aug 22, 2022

Description

3 unexpected nodes in this agent are causing test regressions on Humble

Reference builds:

Failing tests:

Example test

Test: rcl.TestGetNodeNames__rmw_fastrtps_cpp.test_rcl_get_node_names (from TestGetNodeNames__rmw_fastrtps_cpp)
Stacktrace:

/tmp/ws/src/ros2/rcl/rcl/test/rcl/test_get_node_names.cpp:138
Expected equality of these values:
  discovered_nodes
    Which is: { ("demo_node_0", "/"), ("demo_node_1", "/"), ("demo_node_2", "/"), ("node1", "/"), ("node1", "/"), ("node2", "/"), ("node2", "/ns/ns"), ("node3", "/ns") }
  expected_nodes
    Which is: { ("node1", "/"), ("node1", "/"), ("node2", "/"), ("node2", "/ns/ns"), ("node3", "/ns") }

There are 3 unexpected nodes on this test: demo_node_0, demo_node_1, and demo_node_2.

Explanation

  • We tracked errors throughout the log and found test were failing because of 3 nodes that appeared to be created since the start of the build
  • We tried to replicate this error in ci.ros2.org: Build Status
  • As we couldn't replicate the error in ci.ros2.org we think this agent had issues destroying those 3 nodes.
  • We think this error may be related to processes not being closed after a failure (even with Docker).

22/08 Update

Commented by cottsay on Aug 22, 2022

From the troubled node (ci_agent-ffcf5120), this is the container that's still running:

f9566d9966832ea21871dff4f7a10745ad058d74c5c9545eb79632e6e5fee119   1660307708.398302425.ci_build_and_test.rolling   "sh -c 'PATH=/usr/lib/ccache:$PATH PYTHONPATH=/tmp/ros_buildfarm:$PYTHONPATH python3 -u /tmp/ros_buildfarm/scripts/devel/build_and_test.py --rosdistro-name rolling --ros-version 2 --build-tool colcon --workspace-root /tmp/ws --parent-result-space --build-tool-args --cmake-args -DCMAKE_BUILD_TYPE=Release -DSKIP_MULTI_RMW_TESTS=1 --no-warn-unused-cli --build-tool-test-args --retest-until-pass 2 --ctest-args -LE xfail --pytest-args -m \"not xfail\"'"
jenkins+ 2883215  0.0  0.0   2888    96 ?        S    Aug12   0:00 /bin/sh -c PYTHONIOENCODING=utf_8 PYTHONUNBUFFERED=1 colcon test --build-base build_isolated --install-base install_isolated --test-result-base test_results --event-handlers console_direct+ --executor sequential --test-result-base /tmp/ws/test_results --retest-until-pass 2 --ctest-args -LE xfail --pytest-args -m "not xfail"
jenkins+ 2883216  0.0  0.7 224048 57720 ?        Sl   Aug12  13:43 /usr/bin/python3 /usr/bin/colcon test --build-base build_isolated --install-base install_isolated --test-result-base test_results --event-handlers console_direct+ --executor sequential --test-result-base /tmp/ws/test_results --retest-until-pass 2 --ctest-args -LE xfail --pytest-args -m not xfail
jenkins+ 2994922  0.1  0.6 922448 50136 ?        Sl   Aug12  27:04 /usr/bin/python3 -m pytest
jenkins+ 2994957  0.1  0.8 709136 66904 ?        Sl   Aug12  19:16 /tmp/ws/install_isolated/demo_nodes_cpp/lib/demo_nodes_cpp/talker --ros-args -r __node:=demo_node_0
jenkins+ 2994959  0.0  0.8 708968 66648 ?        Sl   Aug12  11:07 /tmp/ws/install_isolated/demo_nodes_cpp/lib/demo_nodes_cpp/talker --ros-args -r __node:=demo_node_1
jenkins+ 2994961  0.0  0.8 709148 66272 ?        Sl   Aug12  11:14 /tmp/ws/install_isolated/demo_nodes_cpp/lib/demo_nodes_cpp/talker --ros-args -r __node:=demo_node_2