Augmenting Valkey with multiplexing interface
Opened this issue · 7 comments
In this proposal we discuss exposing a new protocol that allows serving many Valkey clients over a single TCP connection. Using such a protocol achieves similar performance to a pipeline mode while maintaining, for each client, the same functionality and semantics as if the client was served using a single dedicated TCP connection.
Introduction
A TCP connection for Valkey introduces a client object that maintains a logical state of the connection. This state is used to provide access control guarantees as well as the required semantics for Valkey commands. For example, a client object holds for a connection its ACL user, its watched keys, its blocking state, the pub-sub channels it subscribed to, and more. This state is bound to the TCP connection and is freed upon disconnection.
Since each client uses a dedicated connection, commands for each client are sent to Valkey separately and each response is returned by Valkey to a different network connection. This cause Valkey to spend a large amount of time (62% when using 500 clients) at system calls as well as consume at least one network packet per command and per response.
Pipeline can be applied to reduce the number of packets, amortize the system calls overhead (11% when using 10 clients serving pipelines of 50 commands) and improve the locality (reduction of 44% in L1 cache misses v.s. using 500 clients). However, in many cases, a pipeline cannot be utilized either due to command dependencies or because the client generates only a few commands per second.
For this reason, client implementations like StackExchange.Redis collocate many clients on a single TCP connection while using pipeline to enhance performance. However, since from Valkey perspective only a single logical client is bound to a TCP connection. All collocated clients are handled by Valkey as if all commands arrived from a single client.
Naturally, with such configuration, blocking commands, multi/exec, ACLs, and additional commands cannot preserve their required semantics. Therefore, StackExchangeRedis does not support blocking commands and utilize LUA or constraints to abstract multi/exec. Buffers limits also cannot be managed at client level, along with ACLs.
Furthermore, since Valkey treats all collocated clients as a single client, no fairness guarantees are provided for the clients’ commands. Consequently, a large command or response from one client may impact the latency of other commands from collocated clients.
Our suggestion - multiplexing protocol
In this proposal, we suggested implementing an additional protocol to Valkey that supports connection multiplexing. Multiplexing is achieved by using a single TCP connection, collocating many clients through the addition of extra metadata. The collocation of commands (and responses) for multiple clients simulates pipeline behavior across a large number of clients, resulting in performance similar to that of a pipeline.
The multiplexing protocol supports all Valkey commands at their original semantics. This means that multi/exec and watches can be applied concurrently to different clients, each client may have different ACLs and even a blocked client does not block the entire connection. Moreover, buffer limits are also enforced per client, and the closing of a client can be performed without disconnecting the connection.
When a multiplexed connection is disconnected, all the clients allocated for this connection are closed and the user needs to request new clients to be allocated.
We suggest defining the multiplexing protocol in such a way that each command or response is preceded by a header indicating the client to which the command (or response) is targeted. Additionally, control commands, such as ‘create client’ and ‘client close’, are also encoded in the protocol header.
The following example shows the usage of a single multiplexing connection with two clients, where each client uses a different user with potentially different ACL rules. After the connection is established, an ‘MPXHELLO’ command is sent to define the connection as a multiplexed connection. This command is followed by two ‘create client’ commands that initialize two clients on the Valkey side. After the clients are created, USER1 is set for Client I, and USER2 is set for Client II using Auth commands. Both clients, I and II, then send 'GET' commands for k1 and k2, respectively. At this point, 'Client I' sends a 'BLPOP' command that is blocked since list l1 does not exist. Even though 'Client I' is blocked, and both clients I and II are using the same connection, Client II continues sending 'SET' commands that are processed.
@ohadshacham Could we update the text/diagram with "Valkey" ?
Few questions:
- Does GLIDE support this protocol ? Might serve as a reference for other client developers to support this.
- I think we should also call out other possible benefits. We should be able to avoid connection storm with this protocol I believe. As well as the cost required to establish a TLS connection would get amortized due to the shared nature of physical connection.
- Could you also call out what could be the possible downsides of using the multiplex interface? Can it add latency to the client? What is the estimated additional payload overhead per command?
I think we should also call out other possible benefits. We should be able to avoid connection storm with this protocol I believe. As well as the cost required to establish a TLS connection would get amortized due to the shared nature of physical connection.
Yeah, just like other multiplexing protocols e.g. HTTP/2, I think it's important to have a control mechanism (called “flow control” in HTTP/2) over multiple "streams" in one connection. This way, we can control the priority between streams and prevent some overloaded streams from affecting the entire connection.
Considering that HTTP/2 already has a good ecosystem and a lot of library support, I actually think it’s a good idea to use HTTP/2 directly. But it may bring more complexity.
In addition, the above examples seem to only consider the request-response form, but maybe we also need to consider server-side pushing? It can affect the protocol design.
multiplexing can be incredibly beneficial in large clusters. If the number of clients exceeds 10,000 and connection pooling is enabled, the sheer number of clients itself becomes a burden.
The biggest challenge lies in how the RESP protocol can support different contexts.
On one hand, it introduces the relationship between connections and clients.
On the other hand, for all requests, we need to add a common header (including both request and response), which increases the associated request costs.
The protocol needs to have some id information to achieve this capability. At the same time, the corresponding client needs to be mapped in the id.
@ohadshacham I wonder if you have considered adding QUIC support to Valkey? We would get multiplexing without the need of a new application level protocol.
@pizhenwei thoughts?
@ohadshacham I wonder if you have considered adding QUIC support to Valkey? We would get multiplexing without the need of a new application level protocol.
@pizhenwei thoughts?
As far as I can see:
- the current RESP/TCP connection protocol is simple to maintain, the connection between server and client has the same life cycle with TCP, once the TCP connection gets closed, the Valkey connection is closed together. The new multiplexing protocol leads additional work in user space for both server and client sides. This will result in the connection related parts becoming a heavy burden.
- multiplexing is quite common in distributed storage protocol, for example iSCSI and NVMe-oF.
command id
ortag
is used to distinguish the responses, it usually leads a bit performance drop to manage the inflight commands. And additionalcommand depth
orthe max inflight commands
will be negotiated on connection establishing. - Valkey is a memory KV database, the command is always expected to response quickly. It does NOT need a
command depth
, right? - Once many clients share a single TCP connection, this connection will run on a single CPU and networking interface queue, user may hit performance limitation and learn a lot to tune performance.
So I agree with @PingXie , QUIC may be a better option.
@pizhenwei and @PingXie, I agree that QUIC is an excellent option that can significantly simplify the implementation and resolve head-of-line blocking, which still persists at the TCP level in our case (as with HTTP/2). However, the main drawback is that most of the performance gains in our implementation were achieved through the use of a single querybuf and a single cob. Commands are processed directly from the shared querybuf, and responses are written directly to the shared cob (with a fallback to private buffers when necessary for fairness, blocking, etc.). Using a different stream per client would generate more system calls (though I assume some batching occurs) and could also reduce locality due to excessive queries and shared output buffers.
I also agree that implementing the QUIC protocol in Valkey, as well as adopting it on the client side, would be much simpler, less error-prone, and would provide the required semantics when using a stream per client.