Factual/skuld

Random test failure: stress-test: claim returns nil

Closed this issue · 2 comments

eric commented

https://travis-ci.org/eric/skuld/jobs/29787932#L987

The ID that tries to be claimed isn't the one that we just submitted.

2014-07-12T19:41:48.234 INFO  skuld.vnode: 127.0.0.1:13004/skuld_1: initiating election
2014-07-12T19:41:48.327 INFO  skuld.vnode: 127.0.0.1:13004/skuld_0: initiating election
2014-07-12T19:41:49.012 INFO  skuld.node: 127.0.0.1:13000: enqueue-local: enqueued id #<Bytes 000001472c18804e800000010000000000000820> on vnode 127.0.0.1:13000/skuld_0 for task: {:claims [], :id #<Bytes 000001472c18804e800000010000000000000820>, :data meow}
2014-07-12T19:41:49.013 INFO  skuld.node: 127.0.0.1:13001: enqueue-local: enqueued id #<Bytes 000001472c18804e800000010000000000000820> on vnode 127.0.0.1:13001/skuld_0 for task: {:claims [], :id #<Bytes 000001472c18804e800000010000000000000820>, :data meow}
2014-07-12T19:41:49.013 INFO  skuld.node: 127.0.0.1:13002: enqueue-local: enqueued id #<Bytes 000001472c18804e800000010000000000000820> on vnode 127.0.0.1:13002/skuld_0 for task: {:claims [], :id #<Bytes 000001472c18804e800000010000000000000820>, :data meow}
2014-07-12T19:41:49.025 INFO  skuld.node: 127.0.0.1:13003: claim-local: claiming id from queue: #<Bytes 000001472c1833bc800000010000000000000820>
2014-07-12T19:41:49.025 INFO  skuld.node: 127.0.0.1:13003: claim-local: claim from 127.0.0.1:13003/skuld_1 returned task: nil
2014-07-12T19:41:49.028 INFO  skuld.node: 127.0.0.1:13001: claim-local: claiming id from queue: #<Bytes 000001472c1833a3800000010000000000000820>
2014-07-12T19:41:49.028 INFO  skuld.node: 127.0.0.1:13001: claim-local: claim from 127.0.0.1:13001/skuld_0 returned task: nil

lein test :only skuld.stress-test/election-handoff-test

FAIL in (election-handoff-test) (stress_test.clj:62)
expected: (= id (:id claim))
  actual: (not (= #<Bytes 000001472c18804e800000010000000000000820> nil))
eric commented

I'm starting to wonder if a call to scanner/scan! should be added after a node becomes leader to cause the queue to get populated before having to wait for the scheduled scan to happen.

eric commented

This is fixed with retrying and with tasks being immediately available to claim when a leader is elected (by using a separate queue per vnode).