reusing connections in a transaction state (?)
sgoudelis opened this issue · 5 comments
Good morning,
I am using:
PyPy 5.10.0
txredisapi 1.4.4
Twisted 17.5.0
When under load (lots of redis traffic) there are occasions where, when doing a burst of SCANs to get a list of keys from Redis, instead of getting back cursor information and data, we get the string "QUEUED". From what I understand so far the string "QUEUED" is a reply from Redis when I am issuing commands into a transaction. I am using transactions to periodically update a group of Redis keys asynchronously.
We are using makeUnixConnection() with a pool size of 20.
Here is how I am doing SCANs:
while not endoflist:
try:
result = yield globalredisconnection.scan(cursor, pattern, limit)
cursor = int(result[0])
keys += result[1]
if cursor == 0:
# reached end of list
endoflist = True
except Exception as e:
logger.error("Something went wrong while doing SCAN (%s): %s" % (result, e))
logger.exception(e)
And here is a sample print out of the error occurring:
2018-05-11 01:28:34,278 ERROR: Something went wrong while doing SCAN (QUEUED): invalid literal for int() with base 10: 'Q'
2018-05-11 01:28:34,278 ERROR: invalid literal for int() with base 10: 'Q'
Traceback (most recent call last):
File "/opt/backend/backend.py", line 3187, in getKeyList
cursor = int(result[0])
ValueError: invalid literal for int() with base 10: 'Q'
Can anyone help to shed some light onto this issue ?
This certainly seems like a bug in txredisapi. Looks like when you call scan()
, it takes a connection from the pool, but this connection is within a transaction. If I recall design correctly, connections that are in transaction state should not be available to get from the pool. So we need to understand how transactioned connection might get back in the pool before transaction is completed.
From a quick glance on the source code, I haven't found any clues. But txredisapi's implementation of transactions is far from ideal, so it is easy to miss something...
Do you use .pipeline()
or .unwatch()
? If your code using transaction is short and may be published, can you please post it here?
We use .pipeline()
quite a bit in our code. We also use .multi()
far less than .pipeline()
but still periodically. We do not use .unwatch()
.
Here is a sample code using .multi()
try:
keylist = yield getKeyList("mykey:%s:%s:*" % (var1, var2))
if len(keylist) > 0:
# starting a transaction
transaction = yield globalredisconnection.multi()
for fetchedkey in keylist:
transaction.delete(fetchedkey)
deleteall = yield transaction.commit()
except Exception as e:
logger.error("Something went wrong while deleting keys from redis (%s)" % e)
logger.exception(e)
I still can't find the bug that makes connection that is currently inside transaction to be available in the pool :( Seems like it somehow gets back to RedisFactory.connectionQueue
before transaction is committed or rolled back 😕
BTW, in this particular code you don't need transaction at all — you may just call globalredisconnection.delete(keylist)
. .delete()
(as well as some other methods) supports list of keys to be passed to it.
BTW 2: .pipeline()
is almost never needed when using txredisapi
because all queries are implicitly pipelined due to async nature of Twisted if you don't use yield
on them. These two snippets have almost identical effects:
pipe = yield redis.pipeline()
pipe.op1(args1)
pipe.op2(args2)
pipe.op3(args3)
results = yield pipe.execute_pipeline()
results = yield defer.gatherResults([
redis.op1(args1),
redis.op2(args2),
redis.op3(args3)
])
But using .pipeline()
can have negative effect: if you call samething that can take some time between .pipeline()
and .execute_pipeline()
, connection will not be available in the pool that may be suboptimal.
pipe = redis.pipeline()
pipe.op1(args1)
yield treq.get("http://...") # ← the connection to redis will not be available to another code while request is in progress
pipe.op2(args2)
yield redis.execute_pipeline()
My strong opinion is that pipelines should not have been implemented in txredisapi at all. I think this is the result of misconception about how async network calls is different from sync ones :)
Thank you very much for your effort and time. I will look into this further. Regarding you comments about pipelining. I have a different opinion although not as educated as yours. Reading this https://redis.io/topics/pipelining specifically where it says:
$ (printf "PING\r\nPING\r\nPING\r\n"; sleep 1) | nc localhost 6379
+PONG
+PONG
+PONG
I see benefit in grouping commands (in some cases) in that way due to the reduced RTT. It is also stated below. I do not know how this library is implemented. Is it using something this like this ? How is pipelining implement in this library ?
Thank you again for you time
Since txredisapi is based on Twisted that is all about asynchrony, you may choose yourself whether you want to stop and wait for the end of operation or want to do something else while it is in progress.
If you do:
redis.set('a', 1)
redis.set('b', 2)
redis.set('c', 3)
then all three commands will be actually sent simultaneously, without waiting for each other's responses. When interpeter hits set('b', 2)
call, it doesn't yet received response to the set('a', 1)
. This is not like when you use synchronous redis-py
library.
But if you actually want to wait for the previous operation result before sending the next one, Twisted gives you the very convenient @inlineCallbacks
:
@inlineCallbacks
def f():
yield redis.set('a', 1)
yield redis.set('b', 2)
yield
makes the execution to pause on the first set
and continue to run code of f()
only after response is received (actually, execution is not stopped completely, but instead Twisted's reactor switches to running another coroutines).
So, pipelining Redis requests is fantastic thing and one will only receive a little fraction of Redis performance without it. But with Twisted you can receive pipelining for free without special support from txredisapi, just by not waiting on responses of previous commands before sending next ones.