Document Flaky Tests
Closed this issue · 3 comments
We have tests that don't always pass. We are hoping we can document and gather information around the flakiness that is in our testing suite. We would like to start incrementally improving our tests!
Please feel free to post information around a failing/flaky test you've been experiencing.
Replication steps:
go test ./...
or with gotestsum: gotestsum --format=short-verbose --junitfile $TEST_RESULTS_DIR/$reportname.xml -- -tags=$GOTAGS $pkg
Please provide:
Test Name, Output, and Replication Steps.
=== Errors
fuzzy/node.go:24:26: undefined: log
Replications steps:
gotestsum --format=short-verbose --junitfile=reportname.xml
Most common flaky tests:
112 TestRaft_StartAsLeader
27 TestRaft_UserRestore
20 TestRaft_SnapshotRestore_PeerChange
14 TestRaft_LiveBootstrap
14 TestRaft_Integ
13 TestRaft_RecoverCluster
12 TestRaft_UserSnapshot
6 TestRaft_SendSnapshotFollower
6 TestRaft_LeaderFail
6 TestRaft_AfterShutdown
5 TestRaft_TripleNode
5 TestRaft_SingleNode
5 TestRaft_LeadershipTransferLeaderRejectsClientRequests
5 TestRaft_Barrier
5 TestNetworkTransport_AppendEntriesPipeline_CloseStreams
4 TestRaft_SendSnapshotFollowerFarBehind
3 TestRaft_LeadershipTransferWithSevenNodes
3 TestRaft_GetConfigurationNoBootstrap
2 TestRaft_SendSnapshotAndLogsFollower
1 TestRaft_RemoveLeader
1 TestRaft_LeadershipTransferLeaderReplicationTimeout
1 TestRaft_AutoSnapshot
given there are PRs still open w/ respect to flaky tests might want to get to those before worrying about documenting the tests.
Looks like we made some improvements here. In a sample of 92 builds with the sha 387ddae (effectively the latest master change as comment authoring time), 32 builds failed with the following
13 TestRaft_RecoverCluster
6 TestRaft_Integ
4 TestRaft_LeadershipTransferLeaderReplicationTimeout
4 TestNetworkTransport_AppendEntriesPipeline_CloseStreams
2 TestRaft_SnapshotRestore_PeerChange
2 TestRaft_ProtocolVersion_Upgrade_1_2
2 TestRaft_LeadershipTransferLeaderRejectsClientRequests