`Qless::Queue#peek` not returning correct results for recurring jobs
Opened this issue · 5 comments
As of 515cdc8 a queue can get in an unexpected state leading to recurring jobs not being returned properly by #peek
when no count argument is supplied. At this point I've traced it down to the Lua scripts by working back through the ruby code, I'll post more details as I find them.
[14] pry(main)> queue.peek
=> nil
[15] pry(main)> queue.peek(1)
=> []
[16] pry(main)> queue.peek(2)
=> [<Qless::Job SyncAccount (f5aa9babcd1e420ea38fe537812db03a-9 / power_school_jobs / waiting)>]
[17] pry(main)> queue.peek(3)
=> [<Qless::Job SyncAccount (f5aa9babcd1e420ea38fe537812db03a-9 / power_school_jobs / waiting)>, <Qless::Job SyncAccount (ed6f4e5207454cf3972459990b8eaad3-15 / power_school_jobs / waiting)>]
[18] pry(main)>
Additional data from invoking the Lua 'directly'
18] pry(main)> client.call('peek', 'power_school_jobs', 1)
=> "{}"
[19] pry(main)> client.call('peek', 'power_school_jobs')
Qless::LuaScriptError: user_script:1105: Peek(): Arg "count" missing or not a number: nil
from /home/deploy/pschool/.bundle/gems/ruby/2.1.0/bundler/gems/qless-515cdc862a82/lib/qless/lua_script.rb:30:in `rescue in call'
[20] pry(main)> client.call('peek', 'power_school_jobs', 20)
=> "[{\"jid\":\"f5aa9babcd1e420ea38fe537812db03a-9\",\"retries\":5,\"data\": ...snip..."
Turns out there was a lingering member in the zset
ql:q:<queue_name>-work
, I'm not sure how it remained after the job was removed from the queue or completed. I'll be investigating the housecleaning functions to see where/how this might have happened.
So far I'm not seeing anything that would point at a root cause, the only set that seems to have had this pollution is the work
set.
I just discovered that the completed set has invalid values too, sadly I haven't taken good enough notes to know if there is any overlap between the extras in the work
set and ql:completed
. I'm going to leave this issue open while I gather more data, this problem has occurred a couple times in the past.
We had more issues with one of the sorted sets being in an inconsistent state this morning, this time it was the locks
set on the same queue but the inconsistent state started occurring right after we received notification of an exception of the type Qless::Job::CantCompleteError
.