ManageIQ/kubeclient

test_informer.rb fails sporadically

cben opened this issue · 8 comments

cben commented

I'm seeing various sporadic failures on test_informer.rb, or sometimes it gets stuck until timeout kills it. Examples:

@grosser would you like to investigate?
I haven't looked inside, no idea if just flaky test, or actual bug...

not using this library, so no thanks :D

oh no
image

ahh it's the actual kubeclient ... did this repo get renamed ?
... I though this was something else 🤦
I'll take a look ...

thx for the nice writeup, good to have the actual backtraces and to know it's not a single issue but multiple places

  • ran it 100 times on 3.0 locally and no failure
  • ran it 100 times on 2.7.6 and no failure

the "expected 1 got 2 watch" error would mean that the watch crashed and restarted :/

#586

... maybe you get it to fail again locally 🤞

cben commented

(Yes, repo was moved under manageiq org when Alissa @abonas was leaving Red Hat so we don't depend on her for future maintainer handoffs, and in hope Adam @agrare would join or at least be backup maintainer as I'm having less and less time for it.)

cben commented

BTW I notice with_worker does this: sleep(0.03) # wait for worker to watch which (at least in theory) is not guaranteed. And one test does sleep(0.02) # wait for watch to finish. Generally all uses of sleep() are suspect.
But I haven't dug into logic to say if any sleep race conditions are plausible explanations for any actual failure modes...

maybe #587 fixes this ...

DocX commented

Also few race conditions will be fixed in #597 that could cause the flakiness