Random test failure: stress-test: claim returns nil
Closed this issue · 2 comments
eric commented
https://travis-ci.org/eric/skuld/jobs/29787932#L987
The ID that tries to be claimed isn't the one that we just submitted.
2014-07-12T19:41:48.234 INFO skuld.vnode: 127.0.0.1:13004/skuld_1: initiating election
2014-07-12T19:41:48.327 INFO skuld.vnode: 127.0.0.1:13004/skuld_0: initiating election
2014-07-12T19:41:49.012 INFO skuld.node: 127.0.0.1:13000: enqueue-local: enqueued id #<Bytes 000001472c18804e800000010000000000000820> on vnode 127.0.0.1:13000/skuld_0 for task: {:claims [], :id #<Bytes 000001472c18804e800000010000000000000820>, :data meow}
2014-07-12T19:41:49.013 INFO skuld.node: 127.0.0.1:13001: enqueue-local: enqueued id #<Bytes 000001472c18804e800000010000000000000820> on vnode 127.0.0.1:13001/skuld_0 for task: {:claims [], :id #<Bytes 000001472c18804e800000010000000000000820>, :data meow}
2014-07-12T19:41:49.013 INFO skuld.node: 127.0.0.1:13002: enqueue-local: enqueued id #<Bytes 000001472c18804e800000010000000000000820> on vnode 127.0.0.1:13002/skuld_0 for task: {:claims [], :id #<Bytes 000001472c18804e800000010000000000000820>, :data meow}
2014-07-12T19:41:49.025 INFO skuld.node: 127.0.0.1:13003: claim-local: claiming id from queue: #<Bytes 000001472c1833bc800000010000000000000820>
2014-07-12T19:41:49.025 INFO skuld.node: 127.0.0.1:13003: claim-local: claim from 127.0.0.1:13003/skuld_1 returned task: nil
2014-07-12T19:41:49.028 INFO skuld.node: 127.0.0.1:13001: claim-local: claiming id from queue: #<Bytes 000001472c1833a3800000010000000000000820>
2014-07-12T19:41:49.028 INFO skuld.node: 127.0.0.1:13001: claim-local: claim from 127.0.0.1:13001/skuld_0 returned task: nil
lein test :only skuld.stress-test/election-handoff-test
FAIL in (election-handoff-test) (stress_test.clj:62)
expected: (= id (:id claim))
actual: (not (= #<Bytes 000001472c18804e800000010000000000000820> nil))
eric commented
I'm starting to wonder if a call to scanner/scan!
should be added after a node becomes leader to cause the queue to get populated before having to wait for the scheduled scan to happen.
eric commented
This is fixed with retrying and with tasks being immediately available to claim when a leader is elected (by using a separate queue per vnode).