IlyaSkriblovsky/txredisapi

Are Redis connection pools necessary with twisted asynchronous I/O?

zipxing opened this issue · 11 comments

In node.js, node_redis module just use a single connection. Both Node and Redis are effectively single thread. I don't think you'll gain anything by having multiple connections.

Why txredisapi use connection pool?

If I set pool size to 1 in txredisapi,I noticed that the single connection won't be putback before
recive redis reply,would not it will reduce the asynchronous concurrency?

use txredis single connection,twisted will auto pipeline request and reply:
01-05 15:17:28.106092 IN TWISTED senddata LEN:, 1272
01-05 15:17:28.106181 IN TWISTED recvdata LEN:, 684
use txredisapi single connection, the next request must wait for the redis to respond.

I think txredisapi must use the connection pool to get better performance, is not suitable for asynchronous single threaded mode.

+1

Seems like txredisapi tries to not send second request to the connection until it receives reply to previous one. Is there any good reason to make it this way?

It seems like natural way for blocking API which waits for reply before returning from query method. Blocking API needs additional syntax for request pipelining. But for async code pipelining is a natural way of operating, and waiting for reply before sending next requests adds additional RTT latency for every request.

May be I miss something, but why txredisapi works this way instead of pipelining all requests by default? (just like txmongo does)

Can't possibly remember.. haven't looked at this code in 3-4 years.

Ok, if you don't mind I will:
• Ask @ioga, who wrote this part of code, according to Blame (ConnectionHandler). May be he remember something, although almost 3 years passed.
• Look into code to see whether change of the behavior can break something
• Try to remove waitings and see how it will work

Great

Ok, seems like such behavior is needed at least to properly handle blocking commands: BRPOP/BLPOP/BRPOPLPUSH

I changed txredisapi to txredis in a production environment, it looks like the performance has been improved and the number of redis connections is greatly reduced.
I think txredisapi is best to provide a single connection asynchronous mode. After all, most of the time will not use blocking command.Thanks!

@zipxing I'm trying to make all requests pipelined by default there: https://github.com/IlyaSkriblovsky/txredisapi/tree/implicit-pipelining. My changes need some cleanup, but they seem not to break anything. Please take a look and if my patch seems stable enough to you, you may give it a try on your environment to see how it will perform. BTW, do you use Hiredis?

I have an email discussion with @ioga who originally wrote the code that deliberately doesn't allow sending multiple requests within one connection at once. Let's see what discussion will show.

My personal opinion is that txredisapi doesn't need any special support for pipelining, but instead it should pipeline all requests by default with special handling of commands that may block for some time: BLPOP/BRPOP/BRPOPLPUSH (and may be some others?). If there will be consensus about that, we could drop explicit pipelining with proper deprecation cycle.

TL;DR: I think we should change how txredisapi schedules requests now. This will dramatically increase performance on high throughput without breaking any compatibility. And this change is pretty easy to implement.

I had a discussion with @ioga, looked at how other async Redis client libraries behave and read more than dozen of discussions about the Redis pipelining. It seems like different people mean different things under the "Pipelining" term and this is the common source of misconception. Official Redis' pipelining description leaves place of different interpretations.

There are 3 common methods of sending multiple requests that can be found in various client libraries:

  1. Sequential queries, no pipelining: Send A, Wait for reply to A, Send B, Wait for reply to B. This is how almost all sync client libraries work by default. And this is how txredisapi work by default now.
  2. Streaming pipelining: Send A, Send B, When reply is received, call the corresponding callback. This is how all async client libraries that I found work by default. Some sync client libraries offer special syntax for this method.
  3. Offline pipelining: Add A to the queue, Add B to the queue, Send queue content to server at once, Receive all replies and call callbacks. Some sync and async client libraries offer special syntax for this method, including txredisapi.

This is how other async libraries are working:

  • NodeRedis
    • Does automatic implicit streaming pipelining
    • Has no offline pipelining (reasoning)
  • ioredis
    • Does automatic implicit streaming pipelining
    • Offers explicit offline pipelining
  • StackExchange.Redis
    • All queries are implicitly streaming-pipelined
    • Has no offline pipelining
    • Strongly encourages to use single connection to Redis with heavy accent on streaming pipelining
    • Deliberately has no support for BLPOP/BRPOP/BRPOPLPUSH because they are breaking streaming pipelining
  • Tornado-Redis
    • Does automatic implicit streaming pipelining
    • Offers explicit offline pipelining
  • txredis
    • Does automatic implicit streaming pipelining
    • Has no offline pipelining

So, all async clients always use streaming pipelining by default. And none of them make any use of "Sequential queries" pattern like txredisapi does. And this is pretty obvious, because "Sequential queries" is surely least efficient and used by sync clients only because they need to return replied values from their query methods.

I also found several discussions about what pipelining method Redis is more correct: streaming or offline. The most interesting is this one in official Redis google group. It clearly follows from the answers that Redis clients are expected to do streaming pipelining. Quote:

Pipelining in Redis is very similar to pipelining with HTTP, which pretty much means a request does not have to wait for the previous response to be received. Responses are received in the same order as the requests, but there is no need to stop and wait for all the responses in order to receive just one of them.

@ioga argued that Redis are queuing replies when using pipelining (sorry if I misunderstood your words, Ilia):

The trick is you'll get the results for all pipelined commands only once you finish it, not "ASAP".

But I think this is wrong. Another quote from discussion linked above:

Pipelining does not cause a single response (quite the opposite).

As far as I understood, Redis server itself doesn't have any special handling of pipelining. It just reads requests, process them and sends replies. Pipelining is a solely client-side feature that allow clients to increase network throughput just like TCP Congestion Window makes it possible to stream data efficiently in TCP/IP.

Conclusion
I think that txredisapi works in a wrong way now. There is no any reason for using "Sequential queries" approach in non-blocking async environment. I think txredisapi needs to be changed:

  • Currently txredisapi allows only connection with no outstanding requests to be used to send next request. I suggest to change that and only prevent connection from being used for sending another requests in two cases:
    • There is active MULTI/EXEC transaction on this connection
    • There is active blocking command sent to via this connection (BLPOP/BRPOP/BRPOPLPUSH) and no response is received yet
  • (optional) Currently txredisapi provides offline pipelining via pipeline() method. But this method blocks connection from being used until execute_command() is called. This is usually not a problem, because these methods are expected to be called shortly. But it looks more correct to me to make pipeline() to be the method of the ConnectionHandler, not particular connection, and send queued batch via any ready connection on execute_command().

The good news is that both proposed changes will not break compatibility with any existing client code.

It would be cool to have some feedback on proposed changes.

Sorry for such a long post :(

I've done some benchmarks for current txredisapi vs. one with proposed changes:

server is on localhost

current  txredisapi, simple calls ×100000 = 23.24s
current  txredisapi, offline pipeline ×100000 = 6.23s
proposed txredisapi, simple calls ×100000 = 11.70s
proposed txredisapi, offline pipeline ×100000 = 6.19s

server is on LAN

current  txredisapi, simple calls ×100000 = 46.43s
current  txredisapi, offline pipeline ×100000 = 5.72s
proposed txredisapi, simple calls ×100000 = 10.76s
proposed txredisapi, offline pipeline ×100000 = 5.59s

Tested with these functions:

def simple_calls(rc, n):
    return defer.gatherResults(
        rc.hset("hash", "key", "value")
        for i in range(n)
    )

@defer.inlineCallbacks
def offline_pipeline(rc, n):
    pipe = yield rc.pipeline()
    for i in range(n):
        pipe.hset("hash", "key", "value")
    defer.returnValue((yield pipe.execute_pipeline()))

So, while offline pipelining is still faster, performance increased significantly with simple common use case of issueing many commands without waitings. Offline pipelining is good for streaming tons of data to redis from the single source, but it is hard to use when data comes from different event-based sources.

Also it's amusing that single connection to Redis with new approach still performs ×2 faster than Pool of 10 connections with old approach.

Yeah this is looking much better.

Implemented in #99