hashicorp/raft

Document Flaky Tests

Closed this issue · 3 comments

We have tests that don't always pass. We are hoping we can document and gather information around the flakiness that is in our testing suite. We would like to start incrementally improving our tests!

Please feel free to post information around a failing/flaky test you've been experiencing.

Replication steps:
go test ./... or with gotestsum: gotestsum --format=short-verbose --junitfile $TEST_RESULTS_DIR/$reportname.xml -- -tags=$GOTAGS $pkg

Please provide:
Test Name, Output, and Replication Steps.

=== Errors
fuzzy/node.go:24:26: undefined: log

Replications steps:
gotestsum --format=short-verbose --junitfile=reportname.xml

Most common flaky tests:

    112 TestRaft_StartAsLeader
     27 TestRaft_UserRestore
     20 TestRaft_SnapshotRestore_PeerChange
     14 TestRaft_LiveBootstrap
     14 TestRaft_Integ
     13 TestRaft_RecoverCluster
     12 TestRaft_UserSnapshot
      6 TestRaft_SendSnapshotFollower
      6 TestRaft_LeaderFail
      6 TestRaft_AfterShutdown
      5 TestRaft_TripleNode
      5 TestRaft_SingleNode
      5 TestRaft_LeadershipTransferLeaderRejectsClientRequests
      5 TestRaft_Barrier
      5 TestNetworkTransport_AppendEntriesPipeline_CloseStreams
      4 TestRaft_SendSnapshotFollowerFarBehind
      3 TestRaft_LeadershipTransferWithSevenNodes
      3 TestRaft_GetConfigurationNoBootstrap
      2 TestRaft_SendSnapshotAndLogsFollower
      1 TestRaft_RemoveLeader
      1 TestRaft_LeadershipTransferLeaderReplicationTimeout
      1 TestRaft_AutoSnapshot

given there are PRs still open w/ respect to flaky tests might want to get to those before worrying about documenting the tests.

Looks like we made some improvements here. In a sample of 92 builds with the sha 387ddae (effectively the latest master change as comment authoring time), 32 builds failed with the following

  13 TestRaft_RecoverCluster
   6 TestRaft_Integ
   4 TestRaft_LeadershipTransferLeaderReplicationTimeout
   4 TestNetworkTransport_AppendEntriesPipeline_CloseStreams
   2 TestRaft_SnapshotRestore_PeerChange
   2 TestRaft_ProtocolVersion_Upgrade_1_2
   2 TestRaft_LeadershipTransferLeaderRejectsClientRequests