Tabcorp/redis-rate-limiter

Race condition that leads to a non-expiring redis key

ldperronevolio opened this issue · 5 comments

A race condition is present in the way redis operations are made, which leads to a client that gets rate-limited forever. (Which happened to me).

         .setex(tempKey, opts.window, 0)
         .renamenx(tempKey, realKey)
         .incr(realKey) 

Basically, what I suspect, is that realKey got expired between the renamenx and the incr operation. INCR behavior on a non-existing key is to create it, but with no expiration.

It's hard to reproduce, but in our situation, it's the load balancer health check that triggered the condition, since it always performs the query at the exact same interval.

Thanks for reporting this! We've actually found a simpler way to implement this in Redis > 2.6.12, which now supports multiple flags for a SET command.

# SET test=ok with a TTL of 10 seconds, only if it's not already set
SET 'test' 'ok' NX EX 10

Which I think we can call as

opts.redis.send_command('SET', ['test', 'ok', 'NX', 'EX', opts.window], function(err, res) {
  // if (res == null) then it didn't set, so we're still within that window
});

Would need some more testing, but it's probably safer than the MULTI/EXEC.

Sorry for the very slow update!

My bad, the approach above works well for a yes/no rate limiting, but doesn't support a running count. The current API exposes a current variable with the number of requests in this window, which means

  • we need the INCR call
  • and INCR doesn't support the EX flag
  • so we still need the MULTI / EXEC

I still don't quite understand why it breaks the way it does, because the MULTI/EXEC is atomic, and Redis being single-threaded nothing should expire while it's running... But that being said if your fix prevents that behaviour from happening I'm happy to merge it! 👍

I just verified in the source code of Redis. Being single threaded doesn't matter. The way they implemented their INCR command, it first do a call to lookupKeyWrite, which is implemented like this:

robj *lookupKeyWrite(redisDb *db, robj *key) { expireIfNeeded(db,key); return lookupKey(db,key); }

So even if we are inside a multi/exec, a perfect timing situation may expire the key. I know, the odds are very low, but it did happen multiple time for us where our load balancer health check got blocked.

This makes sense now. Thanks a lot for getting into the source to confirm this bug!

@ldperronevolio thank you for posting that info. It saved me a ton of time. I ran into this exact issue last night where about 1/1,000,000 requests to my rate limiter resulted in a rate limit key being generated with no expire despite the set nx/incr being run in multi exec.

My question is, do you know if this behavior was ever fixed in any recent version of Redis? Thanks.