DataDog/datadog-ci-rb

Track retries of specs

Closed this issue · 8 comments

Is your feature request related to a problem? Please describe.

Spec runners like Shopify's ci-queue allow you to re-queue failed specs. While these specs are flaky, these are not failures across two separate runs, but are failures within the current CI run.

Describe the goal of the feature
Track and allow filtering on retries.

Describe alternatives you've considered
We have monkey-patched datadog-ci to provide support

Changed https://github.com/DataDog/datadog-ci-rb/blob/main/lib/datadog/ci/contrib/rspec/example.rb#L39-L46

         case execution_result.status
         when :passed
           test_span.passed!
         when :failed
           test_span.failed!(exception: execution_result.exception)
         when nil
           # retry spes show up as run but with a nil status
           test_span.retried!(exception: execution_result.exception)
         else
           test_span.skipped!(exception: execution_result.exception) if execution_result.example_skipped?
         end

  module Datadog
    module CI
      class Span
        def retried!(exception: nil)
         
          tracer_span.set_tag(Datadog::CI::Ext::Test::TAG_STATUS, 'Retried')
          tracer_span.set_error(exception) unless exception.nil?
        end
      end
    end
  end

Additional context
In DD
Screenshot 2024-01-04 at 12 53 13 AM

I'd be happy to submit a PR.

Hi @rwc9u!

Thanks a lot for using Datadog CI visibility and for your contribution!

I have a couple of suggestions to your approach for handling ci-queue retries:

  • For Test::TAG_STATUS we use a set of predefined values (FAIL, PASS, SKIP). The custom value you send works more by accident than by design.
  • If you use custom status or for example SKIP status the Datadog flake detection would not work. Our recommended approach is to report the FAIL status for this test to Datadog. Then if subsequent execution passes, this test will be marked as "flaky" in Datadog and you could then filter them as flaky tests (and setup alerting to prevent new flaky tests from appearing)
  • If you would like to create some custom filters on tests that are retried by ci-queue you could use custom tags for this: https://docs.datadoghq.com/tests/setup/ruby/?tab=cloudciprovideragentless#adding-custom-tags-to-tests
  • I would advise you to try the automatic flaky test detection first because maybe it would work even without monkey-patching datadog-ci instrumentation

Please tell me if this approach would work for you.

RSpec support for ci-queue is deprecated right now, so I am a bit hesitant to add any ci-queue specific logic to the datadog-ci instrumentation. Note that we are releasing a new major feature for datadog-ci soon, the test suite level visibility and it won't work with how ci-queue runs rspec tests (so you will have to disable it and stay on test level visibility in order to submit your tests to Datadog).

Let me check out your suggestions.

If I marked the specs that have to be retried by ci-queue as failed, then no test run would look like it had passed in DD because we always have retries. I verified this with a change to our patch (see image below). The specs get marked as flaky, but the test run is marked as failed even though from our perspective it has passed.

Screenshot 2024-01-08 at 2 40 30 PM

I like the idea of the custom tag, or we can also do -@test.status:(pass OR skip OR fail) to see the the retried specs. I think I'll go that route.

Maybe if there was a way to manually mark a spec as flaky?

Hi! There is no way to mark test as flaky from the client side as it is done automatically by the backend.

I am very puzzled to see that 339 tests were correctly marked as flaky and only 3 tests had failed status (this wasn't supposed to happen). One reason I could think of if these tests are somehow reported with different test names or test suite names on retries.

Could you find out which tests are marked as failed and whether there are any subsequent passed runs for these tests to see if all the relevant tags are same for them?

The 3 tests that are marked as failed don't show a subsequent passing run in the pipeline for my branch. It's really weird. For example, if I filter on the @test.fingerprint:e728cfe21c0bba57 (one of the three failes specs) I see it on other runs in other branches, but I don't see any passes for my branch runs. For the specs marked as flaky I see the failed, and I see a passing run in my branch for a given pipeline run. As you mentioned ci-queue is deprecated for rspec, so maybe there is a bug, and it's getting deduped from the junit formatter output. I'm kinda drawing at straws lol.

Yes, this explains why these tests are not marked as flaky.

I think if these tests are being retried by ci-queue, something might have changed their fingerprint. Could you for these 3 failed tests find if you have another test runs with the same @test.name and @test.suite but different @test.fingerprint?

For that branch where I have tweaked our DD patch to mark everything that is retried as failed, the failures only show up once when filtering on @test.name and @test.suite.

That's even more interesting, so the retries for these tests did not make it to Datadog at all.

Do you see all your tests in Datadog, was anything else lost?