test_informer.rb fails sporadically
cben opened this issue · 8 comments
I'm seeing various sporadic failures on test_informer.rb, or sometimes it gets stuck until timeout kills it. Examples:
-
https://github.com/ManageIQ/kubeclient/actions/runs/3538717648/jobs/5939824652 (ruby 3.0.4)
RetryTest#test_timeout [/Users/runner/work/kubeclient/kubeclient/test/test_informer.rb:129]: not all expectations were satisfied unsatisfied expectations: - expected exactly once, invoked never: #<AnyInstance:Kubeclient::Common::WatchStream>.finish(any_parameters)
-
https://github.com/ManageIQ/kubeclient/actions/runs/3359829904/jobs/5568290089 (ruby 2.5 got stuck killed by
timeout
— note3m 0s
run time) -
https://github.com/ManageIQ/kubeclient/actions/runs/3540149961/jobs/5942864716 (ruby 3.1.2 timeout)
-
seen locally after running in a loop (ruby 2.7.5):
RetryTest#test_can_watch_watches [/home/beni/kubeclient/test/test_informer.rb:119]: The request GET /\/v1\/watch\/pods/ was expected to execute 1 time but it executed 2 times
@grosser would you like to investigate?
I haven't looked inside, no idea if just flaky test, or actual bug...
not using this library, so no thanks :D
ahh it's the actual kubeclient ... did this repo get renamed ?
... I though this was something else 🤦
I'll take a look ...
thx for the nice writeup, good to have the actual backtraces and to know it's not a single issue but multiple places
- ran it 100 times on 3.0 locally and no failure
- ran it 100 times on 2.7.6 and no failure
the "expected 1 got 2 watch" error would mean that the watch crashed and restarted :/
... maybe you get it to fail again locally 🤞
(Yes, repo was moved under manageiq org when Alissa @abonas
was leaving Red Hat so we don't depend on her for future maintainer handoffs, and in hope Adam @agrare would join or at least be backup maintainer as I'm having less and less time for it.)
BTW I notice with_worker
does this: sleep(0.03) # wait for worker to watch
which (at least in theory) is not guaranteed. And one test does sleep(0.02) # wait for watch to finish
. Generally all uses of sleep() are suspect.
But I haven't dug into logic to say if any sleep race conditions are plausible explanations for any actual failure modes...