release_stale_locks! might never release a stale lock
mfischer-zd opened this issue · 5 comments
The release_stale_locks!
function might never actually release stale locks under certain conditions. This is because it's possible for a deadlock to occur in simple_mutex
(called by the above function) because the TTL gets reset at each GETSET
operation. This is expected per http://redis.io/commands/expire:
The timeout is cleared only when the key is removed using the DEL command or overwritten using the SET or GETSET commands.
def simple_mutex(key_name, expires = nil)
key_name = namespaced_key(key_name) if key_name.kind_of? Symbol
token = @redis.getset(key_name, API_VERSION) # TTL cleared here if set below
return false unless token.nil?
@redis.expire(key_name, expires) unless expires.nil?
begin
yield token
ensure
@redis.del(key_name)
end
end
The expire
above is useless because it gets cleared as soon as another candidate attempts to acquire the mutex. Here's an observation of the effect from the Redis CLI; commands were run immediately in order (i.e. there wasn't a 9-second wait between queries):
redis:6379> EXPIRE SEMAPHORE:task:release_locks 10
(integer) 1
redis:6379> TTL SEMAPHORE:task:release_locks
(integer) 9
redis:6379> TTL SEMAPHORE:task:release_locks
(integer) -1
In theory, the ensure
clause should prevent this from happening (the DEL
operation is supposed to release the simple mutex), but there are conditions like an unexpected SIGKILL where this clause might not be run.
A workaround would be to ensure that two clients don't call release_stale_locks!
within 10 seconds of each other, but that's not always possible.
Thanks mfischer-zd for the report! I'll look into it.
Did you find something?
bump