m-lab/etl-gardener

tracker TestExpiration may be unreliable / flaky

Closed this issue · 3 comments

During cloud build, the etl gardener go test -race step for the tracker package failed on two builds, and succeeded on a third.

This is a possible race condition.

Step #3 - "Run all gardener unit tests": === RUN   TestExpiration
Step #3 - "Run all gardener unit tests": 2021/08/05 17:18:51 tracker.go:102: DEBUG: Skipping save 2021-08-05 17:18:50.61171094 +0000 UTC m=+1.054659773 2021-08-05 17:18:50.628750616 +0000 UTC m=+1.071699515
Step #3 - "Run all gardener unit tests": 2021/08/05 17:18:51 tracker.go:58: datastore: no such entity /TestExpiration,jobs
Step #3 - "Run all gardener unit tests": 2021/08/05 17:18:52 tracker_test.go:294: job already exists
Step #3 - "Run all gardener unit tests":     tracker_test.go:28: job already exists
Step #3 - "Run all gardener unit tests": --- FAIL: TestExpiration (0.13s)
Step #3 - "Run all gardener unit tests": 2021/08/05 17:18:52 tracker.go:102: DEBUG: Skipping save 2021-08-05 17:18:50.61171094 +0000 UTC m=+1.054659773 2021-08-05 17:18:50.628750616 +0000 UTC m=+1.071699515
Step #3 - "Run all gardener unit tests": 2021/08/05 17:18:52 tracker.go:290: Deleting stale job 20110101:exp/type 127.48681ms 1ms
Step #3 - "Run all gardener unit tests": FAIL
Step #3 - "Run all gardener unit tests": FAIL	github.com/m-lab/etl-gardener/tracker	2.475s

However, I cannot reproduce this locally using:

while go test -count=1 -v ./tracker/... ./ops/... -race ; do sleep .1 ; done

Another potentially flaky failure in cloud/bq/sanity*

Step #3 - "Run all gardener unit tests": ?   	github.com/m-lab/etl-gardener/cloud	[no test files]
Step #3 - "Run all gardener unit tests": === RUN   Test_getTableParts
Step #3 - "Run all gardener unit tests": --- PASS: Test_getTableParts (0.00s)
Step #3 - "Run all gardener unit tests": === RUN   TestSanityCheckAndCopy
Step #3 - "Run all gardener unit tests": 2021/11/18 02:00:48 sanity.go:207: googleapi: Error 400: Cannot parse  as CloudRegion., badRequest
Step #3 - "Run all gardener unit tests": 2021/11/18 02:00:48 sanity.go:208: Query: 
Step #3 - "Run all gardener unit tests": 		#standardSQL
Step #3 - "Run all gardener unit tests": 		SELECT COUNT(DISTINCT test_id) AS TestCount, COUNT(DISTINCT task_filename) AS TaskFileCount
Step #3 - "Run all gardener unit tests":     FROM `dataset.foo_19990101`
Step #3 - "Run all gardener unit tests": 		  -- where clause
Step #3 - "Run all gardener unit tests": 2021/11/18 02:00:48 sanity.go:113: project:dataset.foo_19990101 foo_19990101
Step #3 - "Run all gardener unit tests":     sanity_test.go:80: googleapi: Error 400: Cannot parse  as CloudRegion., badRequest
Step #3 - "Run all gardener unit tests": --- FAIL: TestSanityCheckAndCopy (0.64s)

Succeeds on retry.

The TestSanityCheckAndCopy is still flaky...