key pool can cause memtier to hang when there are multiple shards and request count is bellow pool size

Question

key pool can cause memtier to hang when there are multiple shards and request count is bellow pool size

filipecosta90 opened this issue 2 years ago · 0 comments

this can be easily reproduced with master ( or any previous stable version ) and more than 1 shards scenario on OSS cluster.
with a simple > 1 shard scenario:

$ redis-cli
127.0.0.1:6379> cluster slots
1) 1) (integer) 0
   2) (integer) 5461
   3) 1) "127.0.0.1"
      2) (integer) 6379
      3) "ae7607ffdb3519473d83a9420f3a00c162820a8a"
      4) (empty array)
2) 1) (integer) 5462
   2) (integer) 10923
   3) 1) "127.0.0.1"
      2) (integer) 6381
      3) "ffa17c93a302f721c18377c7256053ecb8175e97"
      4) (empty array)
3) 1) (integer) 10924
   2) (integer) 16383
   3) 1) "127.0.0.1"
      2) (integer) 6383
      3) "eafe2ee085ca92723662dd18661088f70962c197"
      4) (empty array)

you can see that there is no assurance that the 4 requests generated are for the connections that is sending the requests ( given we have 3 connections ).

$ memtier_benchmark  --threads 1 --clients 1 -n 4 --ratio 0:1 --cluster-mode --hide-histogram -D
Writing results to stdout
[RUN #1] Preparing benchmark client...
client.cpp:111: new client 0x55a181879400 successfully set up.
[RUN #1] Launching threads now...
shard_connection.cpp:372: sending cluster slots command.
shard_connection.cpp:425: cluster slot command successful
shard_connection.cpp:587: server 127.0.0.1:6379: GET key=[memtier-1168204]
shard_connection.cpp:587: server 127.0.0.1:6383: GET key=[memtier-6263819]
shard_connection.cpp:438: server 127.0.0.1:6379: handled response (first line): $-1, 0 hits, 1 misses
shard_connection.cpp:587: server 127.0.0.1:6379: GET key=[memtier-3771236]
shard_connection.cpp:438: server 127.0.0.1:6383: handled response (first line): $-1, 0 hits, 1 misses
shard_connection.cpp:587: server 127.0.0.1:6383: GET key=[memtier-5586315]
shard_connection.cpp:438: server 127.0.0.1:6379: handled response (first line): $-1, 0 hits, 1 misses
shard_connection.cpp:438: server 127.0.0.1:6383: handled response (first line): $-1, 0 hits, 1 misses
client.cpp:219: nothing else to do, test is finished.
[RUN #1 100%,   0 secs]  1 threads:           4 ops,    7782 (avg:    7782) ops/sec, 303.99KB/sec (avg: 303.99KB/sec),  0.20 (avg:  0.20) msec latency

this is due to the following codition

m_config->requests > 0 && m_reqs_processed >= m_config->requests

and https://github.com/RedisLabs/memtier_benchmark/blob/master/cluster_client.cpp#L352

// store key for other connection, if queue is not full
        key_index_pool* key_idx_pool = m_key_index_pools[other_conn_id];
        if (key_idx_pool->size() < KEY_INDEX_QUEUE_MAX_SIZE) {
            key_idx_pool->push(*key_index);
            m_reqs_generated++;
        }

This can be solved by avoiding pushing to other pools if we don't have enough requests to fill all pools of all shards.