ros2/rosbag2

:farmer: test_rosbag2_play_end_to_end flaky on Windows

Crola1702 opened this issue · 10 comments

Description

Flaky test test_rosbag2_play_end_to_end in windows CI (debug, release and repeated)

Test regressions:

Expected Behavior

Test should pass

Actual Behavior

Test failing because of a timeout

To Reproduce

  1. Run a build in nightly_win_deb, nightly_win_rel or nightly_win_rep
  2. See projectroot.test_rosbag2_play_end_to_end fail

System (please complete the following information)

  • OS: Windows
  • ROS 2 Distro: Rolling
  • Install Method: Source
  • Version: Rolling latest

Additional context

Reference build: https://ci.ros2.org/view/nightly/job/nightly_win_deb/2813/

Test regressions:

Test is failing because of a timeout:

Log output:

[WARN] [1691491268.193064100] [subscription_manager_1691491266065392000]: SubscriptionManager::continue_spinning(..) finished by timeout
C:\ci\ws\src\ros2\rosbag2\rosbag2_tests\test\rosbag2_tests\test_rosbag2_play_end_to_end.cpp(80): error: Expected equality of these values:
  future.wait_for(service_call_timeout_)
    Which is: 4-byte object <01-00 00-00>
  std::future_status::ready
    Which is: 4-byte object <00-00 00-00>
 1/11 Test  #2: test_rosbag2_play_end_to_end .....***Timeout  60.00 sec

Test gets stuck when the error pops up (normally takes 15 seconds to run)


Flakiness ratio (last 15 days)

job_name last_time first_time build_count failure_count failure_percentage
nightly_win_rep 2023-08-16 2023-08-02 15 13 86.67
nightly_win_rel 2023-08-16 2023-08-02 15 14 93.33
nightly_win_deb 2023-08-16 2023-08-02 15 13 86.67

Updated 17-08-2023

Running a diff between ros2 repos in nightly_win_deb#2806 and nightly_win_deb#2807:

@@ -36,7 +36,7 @@ repositories:
   eProsima\Fast-DDS:
     type: git
     url: https://github.com/eProsima/Fast-DDS.git
-    version: 7cf43a62cabc3124721258f02c9257f451dd1971
+    version: 9ae27f174c1a33bf539c589ce2f1d6630052d1b0
   eProsima\foonathan_memory_vendor:
     type: git
     url: https://github.com/eProsima/foonathan_memory_vendor.git
@@ -208,7 +208,7 @@ repositories:
   ros2\mimick_vendor:
     type: git
     url: https://github.com/ros2/mimick_vendor.git
-    version: 6fcd465251c1e62b1dddabf6607712da8a141a2c
+    version: 24f0be689e525dbcf18cce910ac12bae26c07e3c
   ros2\orocos_kdl_vendor:
     type: git
     url: https://github.com/ros2/orocos_kdl_vendor.git
@@ -244,7 +244,7 @@ repositories:
   ros2\rclpy:
     type: git
     url: https://github.com/ros2/rclpy.git
-    version: 913afa019b0d60e7255bd84247dbeab4735d96ea
+    version: 5367703c9812680a563fb904b8c9d187da310bd3
   ros2\rcpputils:
     type: git
     url: https://github.com/ros2/rcpputils.git
@@ -264,7 +264,7 @@ repositories:
   ros2\rmw_connextdds:
     type: git
     url: https://github.com/ros2/rmw_connextdds.git
-    version: 1206113b6eff6d2a6557911866548ff4415c6852
+    version: b57d032cad10a68f0cb0d349a715b9e3c4475cf4
   ros2\rmw_cyclonedds:
     type: git
     url: https://github.com/ros2/rmw_cyclonedds.git
@@ -300,7 +300,7 @@ repositories:
   ros2\rosbag2:
     type: git
     url: https://github.com/ros2/rosbag2.git
-    version: ba199d05954d6e51975c47acd3725cac6267778f
+    version: fda5aea264b4183d94352a7c7caf266ae104aa52

First time this issue was seen:

ros2/rmw_connextdds#26 (comment)

@clalancette, this is happening almost consistently in Windows builds. Do you think it's a good time to disable this test while #1342 is resolved? (it seems it would be a long time until we have updates on the fix PR)

@Crola1702 @clalancette I guess it hasn't been fully fixed for Windows after my overhaul in player tests per #1297.
I will try to take a look at it this week.
If I will not find a solution we can disable it for Windows again.

BTW. I have come across with this failure in one of my recent PR #1423 (comment).
My preliminary analysis was:

Failed PlayEndToEndTestFixture.play_filters_by_topic by timeout since was not able to receive confirmation about service call for player resume in 60 seconds.

BTW. I have come across with this failure in one of my recent PR #1423 (comment).

I haven't seen this failure in the buildfarm 🤔

I'm disabling the test in: #1452